feat: initial architecture specification and research
Phase 0→1 setup for alknet-firewall — a behavioral signal detection library that screens untrusted LLM inputs using small model activations. Architecture docs (5 specs, 10 ADRs, 7 open questions): - overview: vision, scope, dependencies, package structure - firewall: core API, alarm protocol, score composition, error handling - codebook: SVD basis, spline distributions, calibration, tensor format - model: activation extraction, model-agnostic interface, lazy loading - configuration: thresholds, model selection, detection tuning Research reports: - modern-python-project-setup: uv, pyproject.toml, src layout, ruff, CI - python-ml-packaging: optional PyTorch, HF Hub download, safetensors - llm-input-safety-landscape: threat taxonomy, defenses, academic evidence Agent role adaptations for Python project (replaced Rust conventions).
This commit is contained in:
41
docs/architecture/decisions/001-python-uv.md
Normal file
41
docs/architecture/decisions/001-python-uv.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# ADR-001: Python with uv
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The project needs a programming language and build toolchain. The PoC was
|
||||
written in Python using PyTorch, sklearn, and transformers. A Rust port using
|
||||
burn/cubecl was attempted but failed — the ML framework ecosystem in Rust is
|
||||
not yet mature enough for this type of work.
|
||||
|
||||
The project needs a fast path to a usable system. The PoC already works in
|
||||
Python. Modern Python packaging (uv, pyproject.toml, src layout) provides a
|
||||
professional project structure that was not available even a few years ago.
|
||||
|
||||
## Decision
|
||||
|
||||
Use Python 3.10+ with uv as the package manager and build tool. Use uv_build
|
||||
as the build backend. Use src/ layout for the package.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**:
|
||||
- Fast path to working system — PoC code is already Python
|
||||
- Rich ML ecosystem (PyTorch, transformers, sklearn, safetensors)
|
||||
- uv provides 10-100x faster dependency management than pip
|
||||
- Modern packaging standards (pyproject.toml, PEP 735 dependency groups)
|
||||
- Easy distribution via PyPI with `pip install alknet-firewall[torch]`
|
||||
- Type checking via mypy provides strong correctness guarantees
|
||||
|
||||
**Negative**:
|
||||
- Python is slower than Rust for non-ML code (SVD projection, data wrangling)
|
||||
- PyTorch is a large optional dependency (200MB-2.5GB)
|
||||
- Rust port remains a future goal (Phase 3, speculative)
|
||||
|
||||
## References
|
||||
|
||||
- [modern-python-project-setup.md](../research/modern-python-project-setup.md)
|
||||
- [python-ml-packaging.md](../research/python-ml-packaging.md)
|
||||
52
docs/architecture/decisions/002-behavioral-signals.md
Normal file
52
docs/architecture/decisions/002-behavioral-signals.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# ADR-002: Behavioral Signal Detection (Not Text Classification)
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Existing LLM input defenses (Llama Guard, NeMo Guardrails, Rebuff) are
|
||||
text-surface approaches — they classify input text as safe or unsafe. This
|
||||
fundamentally limits their effectiveness:
|
||||
|
||||
- Obfuscated inputs (Base64, multilingual, synonym substitution) evade keyword
|
||||
and pattern matching
|
||||
- Novel attack types require retraining classifiers
|
||||
- Text that looks natural to a classifier can still be adversarial when
|
||||
processed by a model
|
||||
|
||||
Academic research (2024-2025) demonstrates that adversarial inputs produce
|
||||
distinctive activation patterns in model internals, regardless of surface form.
|
||||
|
||||
## Decision
|
||||
|
||||
Build a behavioral signal detection system that monitors how a model processes
|
||||
inputs (hidden state activations), not what the inputs say (text surface).
|
||||
Adversarial inputs produce anomalous activation patterns that are detectable
|
||||
even when the text itself looks innocent.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**:
|
||||
- Catches obfuscated, multilingual, and novel attacks that text classifiers miss
|
||||
- Anomalous behavior patterns are attack-type agnostic — novel attacks still
|
||||
produce anomalous patterns
|
||||
- Multi-dimensional signals provide interpretable detection (which SVD
|
||||
directions are activated and by how much)
|
||||
- Complementary to existing text-surface defenses — can be layered
|
||||
|
||||
**Negative**:
|
||||
- Requires running a model on every input (adds latency and compute cost)
|
||||
- Detection depends on the detector model sharing architectural similarity
|
||||
with likely attack targets
|
||||
- False positives possible for unusual but benign inputs (domain-specific
|
||||
language, technical content)
|
||||
- No existing production system validates this approach — we are first
|
||||
|
||||
## References
|
||||
|
||||
- [llm-input-safety-landscape.md](../research/llm-input-safety-landscape.md)
|
||||
- HiddenDetect (ACL 2025)
|
||||
- Hidden Dimensions of LLM Alignment (ICML 2025)
|
||||
- How Alignment and Jailbreak Work (EMNLP 2024)
|
||||
56
docs/architecture/decisions/003-small-model-detector.md
Normal file
56
docs/architecture/decisions/003-small-model-detector.md
Normal file
@@ -0,0 +1,56 @@
|
||||
# ADR-003: Small Model (~125M) as Detector
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The behavioral signal detection approach requires running a language model on
|
||||
every input to extract hidden state activations. The choice of model size
|
||||
creates a trade-off:
|
||||
|
||||
- **Large model (7B+)**: Better representation quality, more behavioral signal
|
||||
resolution. But requires GPU, adds ~200-500ms latency, costs more per check.
|
||||
- **Small model (~125M)**: Sufficient representation quality for early-layer
|
||||
safety signals. Runs on CPU, <10ms latency, negligible cost per check.
|
||||
- **Tiny model (<50M)**: Too small for safety-relevant representations to
|
||||
emerge. Lacks the depth where behavioral patterns form.
|
||||
|
||||
EMNLP 2024 research confirms that safety signals are detectable in early
|
||||
layers — the model doesn't need deep processing to produce useful signals.
|
||||
A ~125M model like SmolLM2-135M has enough depth (12 layers, 768 hidden dim)
|
||||
for safety directions to emerge in early layers.
|
||||
|
||||
## Decision
|
||||
|
||||
Use a small model (~125M parameters) as the default detector. SmolLM2-135M
|
||||
(269MB, 12 layers, 768 hidden dim) is the default. Target <10ms latency on
|
||||
CPU. Support model-agnostic detection — any compatible model can be used by
|
||||
recompiling the codebook.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**:
|
||||
- <10ms latency enables real-time pre-inference screening
|
||||
- CPU-deployable — no GPU required for the firewall
|
||||
- Can run alongside target model without blocking
|
||||
- Fast iteration — training/updating a 125M model takes hours, not days
|
||||
- Small enough to embed in API gateways, CDN edges, client applications
|
||||
- 269MB model download is feasible via HF Hub with caching
|
||||
|
||||
**Negative**:
|
||||
- Less representation quality than larger models — may miss subtle signals
|
||||
that a 7B detector would catch
|
||||
- Detector model must share some architectural similarity with target models
|
||||
for behavioral signals to transfer
|
||||
- SmolLM2-135M is English-focused — multilingual detection requires a
|
||||
multilingual detector model
|
||||
- Codebook is model-specific — switching models requires recompilation
|
||||
|
||||
## References
|
||||
|
||||
- [model.md](../model.md)
|
||||
- EMNLP 2024: Safety signals detectable in early layers
|
||||
- Subliminal Learning (Nature 2026): Behavioral traits transmit through
|
||||
non-semantic signals
|
||||
58
docs/architecture/decisions/004-svd-based-detection.md
Normal file
58
docs/architecture/decisions/004-svd-based-detection.md
Normal file
@@ -0,0 +1,58 @@
|
||||
# ADR-004: SVD-Based Anomaly Detection
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
After extracting hidden state activations from the detector model, the
|
||||
firewall needs a method to distinguish normal behavioral patterns from
|
||||
adversarial ones. Options:
|
||||
|
||||
- **Single classifier**: Train a binary classifier on activations. Simple but
|
||||
loses the multi-dimensional structure. Black box.
|
||||
- **SVD + region comparison**: Decompose activation space into principal
|
||||
directions, model normal behavioral regions along each direction, detect
|
||||
inputs that fall outside normal regions. Interpretable, efficient,
|
||||
multi-dimensional.
|
||||
- **Autoencoder anomaly detection**: Train an autoencoder on normal inputs,
|
||||
detect inputs with high reconstruction error. Complex, not interpretable.
|
||||
|
||||
ICML 2025 research shows safety is multi-dimensional in activation space — a
|
||||
dominant refusal direction plus secondary dimensions. SVD naturally discovers
|
||||
these directions. Region comparison provides interpretable per-dimension
|
||||
signals.
|
||||
|
||||
## Decision
|
||||
|
||||
Use SVD-based anomaly detection: decompose activation space via SVD to
|
||||
discover principal behavioral directions, model normal regions along each
|
||||
dimension using monotonic spline distributions, and detect inputs whose
|
||||
projections fall outside normal regions.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**:
|
||||
- Interpretable: Each SVD direction can be labeled (refusal, role-playing, etc.)
|
||||
- Efficient: Projection is O(k) after decomposition, trivial at runtime
|
||||
- Multi-dimensional: Captures the multi-directional nature of safety (ICML 2025)
|
||||
- Robust: SVD captures structure of entire activation space, not a single
|
||||
boundary
|
||||
- Small-model friendly: SVD on 768-dim hidden states is computationally trivial
|
||||
- Deterministic: `scipy.linalg.svd` produces exact, reproducible decomposition
|
||||
(unlike `TruncatedSVD` which uses randomized initialization)
|
||||
|
||||
**Negative**:
|
||||
- SVD basis is model-specific — changing detector model requires recomputation
|
||||
- Basis quality depends on calibration dataset coverage
|
||||
- Linear decomposition may miss non-linear behavioral patterns
|
||||
- Requires a codebook compilation pipeline (Phase 2)
|
||||
- Full SVD on large calibration datasets may be slow (mitigated by
|
||||
relatively small hidden dim: 768)
|
||||
|
||||
## References
|
||||
|
||||
- [codebook.md](../codebook.md)
|
||||
- Hidden Dimensions of LLM Alignment (ICML 2025)
|
||||
- HiddenDetect (ACL 2025)
|
||||
47
docs/architecture/decisions/005-safetensors-only.md
Normal file
47
docs/architecture/decisions/005-safetensors-only.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# ADR-005: Safetensors-Only Model Loading
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Model weight files come in two formats:
|
||||
|
||||
- **Pickle-based** (`.pt`, `.bin`, `.pth`): Can execute arbitrary Python code
|
||||
during loading. Known supply chain attack vector.
|
||||
- **safetensors**: Simple binary format with JSON header. No code execution.
|
||||
76x faster CPU loading. Zero-copy/lazy loading support.
|
||||
|
||||
This is a security product. Loading untrusted pickle files in a security
|
||||
product is a contradiction. The LiteLLM supply chain attack (CVE-2026-33634,
|
||||
CVSS 9.4) demonstrated that compromised model files can lead to credential
|
||||
theft and backdoors.
|
||||
|
||||
## Decision
|
||||
|
||||
Only load model weights from safetensors format. Never load `.pt`, `.bin`,
|
||||
or `.pth` files. Apply this policy to both the detector model and the codebook
|
||||
tensors.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**:
|
||||
- Eliminates entire class of supply chain attacks via model files
|
||||
- 76x faster model loading on CPU
|
||||
- Zero-copy/lazy loading reduces memory usage
|
||||
- Cross-framework compatible (PyTorch, ONNX, numpy)
|
||||
- Consistent with HuggingFace's own migration to safetensors-default
|
||||
|
||||
**Negative**:
|
||||
- Some older models only ship `.bin` weights — must convert before use
|
||||
- Safetensors doesn't support saving optimizer state (irrelevant — we only
|
||||
do inference)
|
||||
- Explicit `use_safetensors=True` parameter needed in transformers for older
|
||||
versions
|
||||
|
||||
## References
|
||||
|
||||
- [python-ml-packaging.md](../research/python-ml-packaging.md) — Section 6:
|
||||
safetensors format comparison
|
||||
- CVE-2026-33634 — LiteLLM supply chain attack
|
||||
64
docs/architecture/decisions/006-optional-pytorch.md
Normal file
64
docs/architecture/decisions/006-optional-pytorch.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# ADR-006: PyTorch as Optional Dependency
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
PyTorch is the primary inference backend for the detector model. However,
|
||||
PyTorch is large:
|
||||
|
||||
- `torch` (CPU): ~200MB download, ~700MB installed
|
||||
- `torch` (CUDA): ~2.5GB download, ~5GB+ installed
|
||||
- `onnxruntime`: ~30-50MB download, ~300MB installed
|
||||
|
||||
Making PyTorch a required dependency would force a 200MB-2.5GB download on
|
||||
every user, even those who already have PyTorch installed or prefer ONNX
|
||||
Runtime. This is the standard problem for ML libraries, and the HuggingFace
|
||||
ecosystem has converged on a solution.
|
||||
|
||||
## Decision
|
||||
|
||||
Make PyTorch an optional dependency via extras (`pip install
|
||||
alknet-firewall[torch]`). The base install includes all non-ML dependencies
|
||||
(sklearn, huggingface-hub, safetensors, tokenizers, numpy). ML inference
|
||||
backends are installed separately.
|
||||
|
||||
Use lazy imports with clear error messages when PyTorch is not installed:
|
||||
|
||||
```python
|
||||
try:
|
||||
import torch
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"PyTorch is required for alknet-firewall inference. "
|
||||
"Install with: pip install 'alknet-firewall[torch]' "
|
||||
"or pip install torch --index-url https://download.pytorch.org/whl/cpu"
|
||||
)
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**:
|
||||
- Base install is ~30MB download, ~100MB installed — very lightweight
|
||||
- Users with existing PyTorch installations don't re-download
|
||||
- ONNX Runtime alternative available for minimal footprint (~100MB total)
|
||||
- Follows HuggingFace ecosystem conventions (transformers, safetensors, HF
|
||||
hub all use this pattern)
|
||||
- uv supports CPU/GPU torch variant selection via `[tool.uv.sources]` and
|
||||
`[[tool.uv.index]]`
|
||||
|
||||
**Negative**:
|
||||
- More complex dependency specification in pyproject.toml
|
||||
- Users must read installation docs to choose the right extra
|
||||
- Runtime import errors if users forget to install a backend
|
||||
- CPU-only torch requires two-step install or uv configuration (can't be
|
||||
expressed in pip extras alone)
|
||||
|
||||
## References
|
||||
|
||||
- [modern-python-project-setup.md](../research/modern-python-project-setup.md) —
|
||||
Section 2: PyTorch handling
|
||||
- [python-ml-packaging.md](../research/python-ml-packaging.md) — Section 1:
|
||||
PyTorch as dependency
|
||||
53
docs/architecture/decisions/007-runtime-model-download.md
Normal file
53
docs/architecture/decisions/007-runtime-model-download.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# ADR-007: Runtime Model Download via HuggingFace Hub
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The detector model (SmolLM2-135M) is ~269MB. This is too large to bundle in a
|
||||
Python package — PyPI has a 60MB per-file limit and 1GB total project size
|
||||
limit. Even if it were allowed, a 269MB wheel download is terrible UX.
|
||||
|
||||
Options:
|
||||
- **Bundle in package**: Not feasible due to size constraints
|
||||
- **Separate package for model**: Possible but awkward, requires users to
|
||||
install two packages
|
||||
- **Runtime download via HuggingFace Hub**: Standard approach used by
|
||||
transformers. Provides caching, authentication, offline mode, and
|
||||
checksum verification
|
||||
- **Custom download (S3, etc.)**: Works but reinvents the wheel
|
||||
|
||||
## Decision
|
||||
|
||||
Download the detector model at runtime via HuggingFace Hub (`snapshot_download`
|
||||
or `from_pretrained` with automatic caching). Support offline mode via
|
||||
`HF_HUB_OFFLINE=1` or `local_files_only=True`. Provide a CLI command for
|
||||
pre-downloading models in air-gapped environments.
|
||||
|
||||
Pin model revisions to specific commit hashes for reproducibility.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**:
|
||||
- Package stays small (~30MB base install)
|
||||
- HuggingFace Hub provides automatic caching, deduplication, and checksum
|
||||
verification
|
||||
- Offline mode supported via environment variable
|
||||
- Authentication for gated models via `HF_TOKEN`
|
||||
- Standard approach — users familiar with transformers will recognize the
|
||||
pattern
|
||||
|
||||
**Negative**:
|
||||
- First run requires network access and ~269MB download (with progress bar)
|
||||
- Model availability depends on HuggingFace Hub uptime
|
||||
- Users in restricted networks need to pre-download models
|
||||
- Different model versions may produce different detection results — must
|
||||
pin revisions
|
||||
|
||||
## References
|
||||
|
||||
- [python-ml-packaging.md](../research/python-ml-packaging.md) — Section 2:
|
||||
Model file distribution
|
||||
- [model.md](../model.md)
|
||||
47
docs/architecture/decisions/008-three-level-alarm.md
Normal file
47
docs/architecture/decisions/008-three-level-alarm.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# ADR-008: Three-Level Alarm System
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
The firewall needs to communicate detection results to downstream systems. The
|
||||
design choice is how many alarm levels and what they mean.
|
||||
|
||||
Alternatives:
|
||||
- **Binary (safe/unsafe)**: Simple but loses nuance. Many suspicious inputs
|
||||
don't warrant blocking but should be flagged. Binary forces a single
|
||||
threshold that either blocks too much (high false positive) or too little
|
||||
(high false negative).
|
||||
- **Numeric-only (0.0–1.0 score)**: Maximum information but requires every
|
||||
consumer to choose their own threshold. No shared vocabulary for what's
|
||||
actionable.
|
||||
- **Five-tier** (safe/low/medium/high/critical): Over-engineered for a
|
||||
pre-inference screening system. The difference between "low" and "medium"
|
||||
is too subtle for consumers to act on differently.
|
||||
- **Three-tier** (clear/suspicious/dangerous): Balances simplicity with
|
||||
nuance. Clear = pass. Dangerous = block. Suspicious = flag for additional
|
||||
review. Most practical for automated systems.
|
||||
|
||||
## Decision
|
||||
|
||||
Use three alarm levels: `CLEAR`, `SUSPICIOUS`, `DANGEROUS`. Include a
|
||||
continuous score (0.0–1.0) for consumers that need fine-grained decisions.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**:
|
||||
- Clear action mapping: pass, flag, block
|
||||
- Suspicious level enables defense-in-depth (apply additional checks rather
|
||||
than binary block/allow)
|
||||
- Continuous score provides gradient for consumers that need it
|
||||
- Simple to document and communicate
|
||||
|
||||
**Negative**:
|
||||
- Some consumers may need more granularity (but can use the score field)
|
||||
- "Suspicious" requires consumers to decide what to do — adds decision burden
|
||||
|
||||
## References
|
||||
|
||||
- [firewall.md](../firewall.md)
|
||||
55
docs/architecture/decisions/009-last-token-extraction.md
Normal file
55
docs/architecture/decisions/009-last-token-extraction.md
Normal file
@@ -0,0 +1,55 @@
|
||||
# ADR-009: Last-Token Activation Extraction
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
To extract behavioral signals from the detector model, we must choose which
|
||||
token's hidden state to use from the sequence of hidden states produced during
|
||||
inference. Options:
|
||||
|
||||
- **Last token**: The hidden state at the final position, which has attended
|
||||
to the entire sequence. Standard for sequence classification (used by BERT
|
||||
pools, GPT-style models naturally aggregate at the last position).
|
||||
- **Mean pooling**: Average hidden states across all positions. Smooths out
|
||||
position-specific effects but dilutes signal from safety-relevant tokens.
|
||||
- **CLS token**: A dedicated classification token (BERT-style). SmolLM2-135M
|
||||
(LLaMA architecture) does not use a CLS token.
|
||||
- **First token**: Has seen only the beginning of the sequence. Misses
|
||||
context from later tokens.
|
||||
- **Max pooling**: Per-dimension maximum across positions. Noisy — a single
|
||||
position with extreme activation can dominate.
|
||||
|
||||
Last-token extraction is the standard for autoregressive (GPT/LLaMA-style)
|
||||
models because the last position's hidden state has attended to the full
|
||||
sequence via causal attention. For safety detection, this means the last
|
||||
token's representation contains the model's "conclusion" about the entire
|
||||
input.
|
||||
|
||||
## Decision
|
||||
|
||||
Extract the last token's hidden state at each configured layer. This is
|
||||
standard for LLaMA-family models and provides full-sequence context.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**:
|
||||
- Standard approach for autoregressive models — well-validated
|
||||
- Full sequence context via causal attention
|
||||
- Single vector per layer — simple to project and score
|
||||
- No padding sensitivity (unlike mean pooling with attention masks)
|
||||
|
||||
**Negative**:
|
||||
- Position-dependent — the last token's representation is influenced by its
|
||||
position in the sequence, not just its content
|
||||
- Very short inputs (1–2 tokens) may not have enough context for meaningful
|
||||
activation patterns
|
||||
- May miss patterns in long inputs where the adversarial payload is in the
|
||||
middle rather than the end
|
||||
|
||||
## References
|
||||
|
||||
- [model.md](../model.md)
|
||||
- [codebook.md](../codebook.md)
|
||||
@@ -0,0 +1,64 @@
|
||||
# ADR-010: Monotonic Spline Distributions for Behavioral Region Modeling
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
After projecting activations onto SVD dimensions, the firewall needs to score
|
||||
how "normal" or "anomalous" a projection is relative to the distribution of
|
||||
normal inputs. This requires modeling the probability density of normal inputs
|
||||
along each dimension.
|
||||
|
||||
Alternatives:
|
||||
- **Gaussian**: Simple, well-understood. But real behavioral distributions are
|
||||
often skewed, multimodal, or heavy-tailed. Gaussian assumes symmetry.
|
||||
- **Kernel Density Estimation (KDE)**: Non-parametric, flexible. But
|
||||
bandwidth selection is tricky, and KDE doesn't provide a parametric form for
|
||||
efficient storage and fast evaluation.
|
||||
- **Mixture of Gaussians**: More flexible than single Gaussian. But requires
|
||||
choosing the number of components and risks overfitting.
|
||||
- **Empirical CDF**: Non-parametric, no assumptions. But requires storing all
|
||||
calibration data points — not compact.
|
||||
- **Monotonic spline distributions**: Parametric CDF modeled as a monotonic
|
||||
spline. Compact (handful of knots), smooth, tail-sensitive, and
|
||||
differentiable. The CDF is naturally monotonic, which enforces a valid
|
||||
probability distribution.
|
||||
|
||||
## Decision
|
||||
|
||||
Use monotonic spline distributions to model behavioral regions along each SVD
|
||||
dimension. The CDF is represented as a monotonic cubic spline with a small
|
||||
number of knots (typically 10–20 per dimension). Tail behavior uses
|
||||
exponential decay beyond the observed range.
|
||||
|
||||
The scoring function computes how far a projection falls in the tail of the
|
||||
distribution — projections well within the normal region score low (CLEAR),
|
||||
projections near or beyond the tail score increasingly high.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive**:
|
||||
- **Smooth scoring**: Continuous score rather than hard threshold, avoiding
|
||||
cliff-edge behavior
|
||||
- **Tail sensitivity**: Exponential tails capture rare-but-critical anomalous
|
||||
inputs without flagging the bulk of normal inputs
|
||||
- **Parametric compactness**: A handful of spline knots (10–20) represent the
|
||||
full distribution shape. Very small storage footprint.
|
||||
- **Differentiability**: Scores are differentiable — potential for future
|
||||
adversarial training or gradient-based analysis
|
||||
- **No distributional assumptions**: Unlike Gaussian, spline distributions
|
||||
handle skew, heavy tails, and non-standard shapes
|
||||
|
||||
**Negative**:
|
||||
- More complex than Gaussian — requires spline fitting during codebook
|
||||
compilation
|
||||
- Spline knot selection affects scoring quality — poor knot placement can
|
||||
miss important distribution features
|
||||
- Less familiar to most ML practitioners than Gaussian or KDE
|
||||
|
||||
## References
|
||||
|
||||
- [codebook.md](../codebook.md)
|
||||
- metaspline PoC: `spline.py`, `transform.py`, `space.py` (~280 lines total)
|
||||
Reference in New Issue
Block a user