alknet-firewall/docs/architecture/decisions/010-monotonic-spline-distributions.md

# ADR-010: Monotonic Spline Distributions for Behavioral Region Modeling

## Status

Accepted

## Context

After projecting activations onto SVD dimensions, the firewall needs to score
how "normal" or "anomalous" a projection is relative to the distribution of
normal inputs. This requires modeling the probability density of normal inputs
along each dimension.

Alternatives:
- **Gaussian**: Simple, well-understood. But real behavioral distributions are
  often skewed, multimodal, or heavy-tailed. Gaussian assumes symmetry.
- **Kernel Density Estimation (KDE)**: Non-parametric, flexible. But
  bandwidth selection is tricky, and KDE doesn't provide a parametric form for
  efficient storage and fast evaluation.
- **Mixture of Gaussians**: More flexible than single Gaussian. But requires
  choosing the number of components and risks overfitting.
- **Empirical CDF**: Non-parametric, no assumptions. But requires storing all
  calibration data points — not compact.
- **Monotonic spline distributions**: Parametric CDF modeled as a monotonic
  spline. Compact (handful of knots), smooth, tail-sensitive, and
  differentiable. The CDF is naturally monotonic, which enforces a valid
  probability distribution.

## Decision

Use monotonic spline distributions to model behavioral regions along each SVD
dimension. The CDF is represented as a monotonic cubic spline with a small
number of knots (typically 10–20 per dimension). Tail behavior uses
exponential decay beyond the observed range.

The scoring function computes how far a projection falls in the tail of the
distribution — projections well within the normal region score low (CLEAR),
projections near or beyond the tail score increasingly high.

## Consequences

**Positive**:
- **Smooth scoring**: Continuous score rather than hard threshold, avoiding
  cliff-edge behavior
- **Tail sensitivity**: Exponential tails capture rare-but-critical anomalous
  inputs without flagging the bulk of normal inputs
- **Parametric compactness**: A handful of spline knots (10–20) represent the
  full distribution shape. Very small storage footprint.
- **Differentiability**: Scores are differentiable — potential for future
  adversarial training or gradient-based analysis
- **No distributional assumptions**: Unlike Gaussian, spline distributions
  handle skew, heavy tails, and non-standard shapes

**Negative**:
- More complex than Gaussian — requires spline fitting during codebook
  compilation
- Spline knot selection affects scoring quality — poor knot placement can
  miss important distribution features
- Less familiar to most ML practitioners than Gaussian or KDE

## References

- [codebook.md](../codebook.md)
- metaspline PoC: `spline.py`, `transform.py`, `space.py` (~280 lines total)