Files
alknet-firewall/docs/architecture/decisions/010-monotonic-spline-distributions.md
glm-5.1 cf464c2296 feat: initial architecture specification and research
Phase 0→1 setup for alknet-firewall — a behavioral signal detection
library that screens untrusted LLM inputs using small model activations.

Architecture docs (5 specs, 10 ADRs, 7 open questions):
- overview: vision, scope, dependencies, package structure
- firewall: core API, alarm protocol, score composition, error handling
- codebook: SVD basis, spline distributions, calibration, tensor format
- model: activation extraction, model-agnostic interface, lazy loading
- configuration: thresholds, model selection, detection tuning

Research reports:
- modern-python-project-setup: uv, pyproject.toml, src layout, ruff, CI
- python-ml-packaging: optional PyTorch, HF Hub download, safetensors
- llm-input-safety-landscape: threat taxonomy, defenses, academic evidence

Agent role adaptations for Python project (replaced Rust conventions).
2026-06-13 05:17:40 +00:00

2.7 KiB
Raw Blame History

ADR-010: Monotonic Spline Distributions for Behavioral Region Modeling

Status

Accepted

Context

After projecting activations onto SVD dimensions, the firewall needs to score how "normal" or "anomalous" a projection is relative to the distribution of normal inputs. This requires modeling the probability density of normal inputs along each dimension.

Alternatives:

  • Gaussian: Simple, well-understood. But real behavioral distributions are often skewed, multimodal, or heavy-tailed. Gaussian assumes symmetry.
  • Kernel Density Estimation (KDE): Non-parametric, flexible. But bandwidth selection is tricky, and KDE doesn't provide a parametric form for efficient storage and fast evaluation.
  • Mixture of Gaussians: More flexible than single Gaussian. But requires choosing the number of components and risks overfitting.
  • Empirical CDF: Non-parametric, no assumptions. But requires storing all calibration data points — not compact.
  • Monotonic spline distributions: Parametric CDF modeled as a monotonic spline. Compact (handful of knots), smooth, tail-sensitive, and differentiable. The CDF is naturally monotonic, which enforces a valid probability distribution.

Decision

Use monotonic spline distributions to model behavioral regions along each SVD dimension. The CDF is represented as a monotonic cubic spline with a small number of knots (typically 1020 per dimension). Tail behavior uses exponential decay beyond the observed range.

The scoring function computes how far a projection falls in the tail of the distribution — projections well within the normal region score low (CLEAR), projections near or beyond the tail score increasingly high.

Consequences

Positive:

  • Smooth scoring: Continuous score rather than hard threshold, avoiding cliff-edge behavior
  • Tail sensitivity: Exponential tails capture rare-but-critical anomalous inputs without flagging the bulk of normal inputs
  • Parametric compactness: A handful of spline knots (1020) represent the full distribution shape. Very small storage footprint.
  • Differentiability: Scores are differentiable — potential for future adversarial training or gradient-based analysis
  • No distributional assumptions: Unlike Gaussian, spline distributions handle skew, heavy tails, and non-standard shapes

Negative:

  • More complex than Gaussian — requires spline fitting during codebook compilation
  • Spline knot selection affects scoring quality — poor knot placement can miss important distribution features
  • Less familiar to most ML practitioners than Gaussian or KDE

References

  • codebook.md
  • metaspline PoC: spline.py, transform.py, space.py (~280 lines total)