Files
alknet-firewall/docs/architecture/decisions/010-monotonic-spline-distributions.md
glm-5.1 cf464c2296 feat: initial architecture specification and research
Phase 0→1 setup for alknet-firewall — a behavioral signal detection
library that screens untrusted LLM inputs using small model activations.

Architecture docs (5 specs, 10 ADRs, 7 open questions):
- overview: vision, scope, dependencies, package structure
- firewall: core API, alarm protocol, score composition, error handling
- codebook: SVD basis, spline distributions, calibration, tensor format
- model: activation extraction, model-agnostic interface, lazy loading
- configuration: thresholds, model selection, detection tuning

Research reports:
- modern-python-project-setup: uv, pyproject.toml, src layout, ruff, CI
- python-ml-packaging: optional PyTorch, HF Hub download, safetensors
- llm-input-safety-landscape: threat taxonomy, defenses, academic evidence

Agent role adaptations for Python project (replaced Rust conventions).
2026-06-13 05:17:40 +00:00

64 lines
2.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR-010: Monotonic Spline Distributions for Behavioral Region Modeling
## Status
Accepted
## Context
After projecting activations onto SVD dimensions, the firewall needs to score
how "normal" or "anomalous" a projection is relative to the distribution of
normal inputs. This requires modeling the probability density of normal inputs
along each dimension.
Alternatives:
- **Gaussian**: Simple, well-understood. But real behavioral distributions are
often skewed, multimodal, or heavy-tailed. Gaussian assumes symmetry.
- **Kernel Density Estimation (KDE)**: Non-parametric, flexible. But
bandwidth selection is tricky, and KDE doesn't provide a parametric form for
efficient storage and fast evaluation.
- **Mixture of Gaussians**: More flexible than single Gaussian. But requires
choosing the number of components and risks overfitting.
- **Empirical CDF**: Non-parametric, no assumptions. But requires storing all
calibration data points — not compact.
- **Monotonic spline distributions**: Parametric CDF modeled as a monotonic
spline. Compact (handful of knots), smooth, tail-sensitive, and
differentiable. The CDF is naturally monotonic, which enforces a valid
probability distribution.
## Decision
Use monotonic spline distributions to model behavioral regions along each SVD
dimension. The CDF is represented as a monotonic cubic spline with a small
number of knots (typically 1020 per dimension). Tail behavior uses
exponential decay beyond the observed range.
The scoring function computes how far a projection falls in the tail of the
distribution — projections well within the normal region score low (CLEAR),
projections near or beyond the tail score increasingly high.
## Consequences
**Positive**:
- **Smooth scoring**: Continuous score rather than hard threshold, avoiding
cliff-edge behavior
- **Tail sensitivity**: Exponential tails capture rare-but-critical anomalous
inputs without flagging the bulk of normal inputs
- **Parametric compactness**: A handful of spline knots (1020) represent the
full distribution shape. Very small storage footprint.
- **Differentiability**: Scores are differentiable — potential for future
adversarial training or gradient-based analysis
- **No distributional assumptions**: Unlike Gaussian, spline distributions
handle skew, heavy tails, and non-standard shapes
**Negative**:
- More complex than Gaussian — requires spline fitting during codebook
compilation
- Spline knot selection affects scoring quality — poor knot placement can
miss important distribution features
- Less familiar to most ML practitioners than Gaussian or KDE
## References
- [codebook.md](../codebook.md)
- metaspline PoC: `spline.py`, `transform.py`, `space.py` (~280 lines total)