Files
alknet-firewall/docs/architecture/configuration.md
glm-5.1 7d8a39a88a docs: resolve 4 open questions, add research, spec codebook package structure
Research-driven resolution of OQ-01, OQ-02, OQ-05, OQ-06:

- OQ-01: Remove ONNX Runtime from scope entirely — doesn't support
  activation extraction natively (optimum #972 closed as not planned),
  bloated model exports; burn/cublas via safetensors is a better future path

- OQ-02: Codebook compresses ~65% (1,245 → 500-600 lines); add Package
  Structure and Extraction from PoC sections to codebook.md based on PoC
  analysis of metaspline firewall_codebook.py

- OQ-05: Standalone API + thin adapter pattern (ADR-011); Phase 1 ships
  Firewall.screen() only, Phase 2 adds <100-line adapter packages for
  LlamaFirewall, OpenAI Agents SDK, NeMo Guardrails

- OQ-06: TOML for file-based config — standard modern Python, two-way door

Also: research OQ-03 rolling windows from taskgraph-semantic reference code,
remove onnxruntime/optimum from dependencies, move streaming screening to
Phase 2, add burn/cublas as Phase 3 alternative backend.
2026-06-13 07:27:40 +00:00

113 lines
4.0 KiB
Markdown

---
status: draft
last_updated: 2026-06-13
---
# Configuration
Configuration for the firewall: model selection, detection thresholds,
alarm levels, and operational parameters.
## What It Is
The configuration component defines all tunable parameters for the firewall.
It controls which model is used, how aggressively inputs are screened, and
what alarm levels map to what scores.
## Why It Exists
Different deployment contexts need different detection sensitivity. A
high-security environment (e.g., screening inputs to a system with access to
sensitive data) may want aggressive thresholds that flag more suspicious
inputs. A low-risk chatbot may prefer permissive thresholds that minimize
false positives. The configuration component makes these trade-offs explicit
and tunable.
## Configuration Structure
### Thresholds
```python
@dataclass
class Thresholds:
suspicious: float = 0.3 # Score above which input is SUSPICIOUS
dangerous: float = 0.7 # Score above which input is DANGEROUS
per_dimension: dict[int, float] | None = None # Override per SVD dimension
```
Default thresholds are calibrated against the codebook's behavioral regions
and shipped with each codebook. Once calibrated, models produce remarkably
similar behavioral patterns (inspired by the "platonic representation hypothesis"
— different models converge on similar internal representations). Per-dimension
overrides allow tuning sensitivity for specific behavioral patterns (e.g.,
lower threshold on the refusal-suppression dimension). Users can always
override the codebook's recommended thresholds.
### Model Configuration
```python
@dataclass
class ModelConfig:
model_id: str = "HuggingFaceTB/SmolLM2-135M"
revision: str = "<pinned-commit>" # Specific commit, not "main"
device: str = "cpu"
extraction_layers: list[int] = field(default_factory=lambda: [1, 2, 4, 8])
cache_dir: str | None = None
```
Extraction layers are chosen based on EMNLP 2024 findings that safety signals
appear in early layers. The default set covers early (1, 2) and mid (4, 8)
layers of the 12-layer SmolLM2-135M model.
### Codebook Configuration
```python
@dataclass
class CodebookConfig:
source: str = "bundled" # "bundled" | "hf_hub" | "local"
repo_id: str | None = None # HuggingFace repo if source="hf_hub"
revision: str | None = None # HuggingFace revision
path: Path | None = None # Local path if source="local"
n_dimensions: int = 10 # Number of SVD dimensions to retain
```
### Full Configuration
```python
@dataclass
class FirewallConfig:
model: ModelConfig = field(default_factory=ModelConfig)
codebook: CodebookConfig = field(default_factory=CodebookConfig)
thresholds: Thresholds = field(default_factory=Thresholds)
```
## Defaults
All configuration has sensible defaults. The firewall works out of the box:
```python
# All defaults
firewall = Firewall()
alarm = firewall.screen("Hello, how are you?")
# alarm.level == AlarmLevel.CLEAR
```
No configuration file is required. All parameters can be passed via the
constructor. A future phase may add file-based configuration (TOML, consistent
with Python packaging conventions and `pyproject.toml`).
## Design Decisions
| ADR | Decision | Summary |
|-----|----------|---------|
| [003](decisions/003-small-model-detector.md) | Small model detector | Defaults to SmolLM2-135M |
| [006](decisions/006-optional-pytorch.md) | Optional PyTorch | Device config allows CPU-only |
| [007](decisions/007-runtime-model-download.md) | Runtime download | Model revision must be pinned |
## Open Questions
Open questions are tracked in [open-questions.md](open-questions.md). Key
questions affecting this document:
- ~~**OQ-04**~~: ~~Should detection thresholds be per-model or globally configurable?~~ (resolved — both: model-specific defaults shipped with codebook, user-overridable)
- ~~**OQ-06**~~: ~~Should file-based configuration use TOML or YAML?~~ (resolved — TOML, consistent with modern Python packaging)