Files
alknet-firewall/docs/architecture
glm-5.1 45a0e0798c docs: add copula decomposition pipeline, clarify detection data flow
The architecture specs previously described detection as a single-vector
path (one activation → one z-coordinate → one alarm), but the PoC operates
on per-token z-coordinate sequences with a two-stage copula decomposition.

Key updates:
- codebook.md: Add Copula Decomposition section (z → CDF → simplex →
  barycentric → (S, u, v)), Direction Profiles and Contrast Pairs section,
  Token-Level Smoothing section, classifier weights and direction profiles
  to data format, updated Internal API with decompose/classify/detect methods
- codebook.md: Clarify z-coordinate shapes — training is (N, 3) flattened
  per-token positions, inference is (seq_len, 3) per-token sequence
- firewall.md: Update data flow to 10-step pipeline including copula
  decomposition, smoothing, and direction classification; update score
  composition to use direction-level P(active); update DimensionSignal
  dataclass; update latency budget with copula/smoothing/classification steps
- model.md: Add Phase 1 (last-token) vs Phase 2 (per-token) extraction modes
- ADR-009: Note last-token is Phase 1 simplification, per-token is full
  pipeline
2026-06-13 08:17:09 +00:00
..

status, last_updated
status last_updated
draft 2026-06-13

alknet-firewall — Architecture

Current State

Phase 0→1 (Exploration → Architecture) — The project has a working PoC demonstrating that behavioral signals from small language models can detect adversarial inputs. The core detection logic (~1,745 lines) works reasonably well but lacks tests, has excessive codebook size, and needs extraction from the research codebase into a properly structured Python package.

This project extracts and productionizes the behavioral signal detection approach from the metaspline research project. A ~125M parameter model (SmolLM2-135M) processes untrusted inputs and produces hidden state activations. SVD-based dimensionality reduction on these activations reveals behavioral patterns — normal inputs cluster in expected regions while adversarial inputs produce anomalous activation signatures. The system raises "behavioral alarms" without needing to know specific attack types.

Architecture Documents

Document Status Description
overview.md Draft Vision, scope, package structure, dependencies
firewall.md Draft Core firewall API, input screening, alarm protocol
codebook.md Draft SVD basis, detection parameters, codebook compilation
model.md Draft Model loading, activation extraction, model-agnostic design
configuration.md Draft Thresholds, model selection, detection tuning
open-questions.md Active Unresolved questions tracker with OQ-IDs

ADR Table

ADR Title Status
001 Python with uv Accepted
002 Behavioral Signal Detection (Not Text Classification) Accepted
003 Small Model (~125M) as Detector Accepted
004 SVD-Based Anomaly Detection Accepted
005 Safetensors-Only Model Loading Accepted
006 PyTorch as Optional Dependency Accepted
007 Runtime Model Download via HuggingFace Hub Accepted
008 Three-Level Alarm System Accepted
009 Last-Token Activation Extraction Accepted
010 Monotonic Spline Distributions Accepted
011 Standalone API + Thin Adapter Integration Accepted

Open Questions

See open-questions.md for the full tracker.

OQ Question Priority Status
OQ-01 Should ONNX Runtime be a supported inference backend in Phase 1? medium resolved (removed from scope; burn/cublas is better future path)
OQ-02 What is the minimum viable codebook — can the 1,245-line codebook be compressed? high resolved (~65% compression to 500600 lines)
OQ-03 Should the firewall support streaming/chunked input screening? medium open (research complete, Phase 2)
OQ-04 Should detection thresholds be per-model or globally configurable? medium resolved (both: model-specific defaults, user-overridable)
OQ-05 How should the firewall integrate with existing guardrail systems? medium resolved (ADR-011: standalone API + thin adapters)
OQ-06 Should file-based configuration use TOML or YAML? low resolved (TOML)

Document Lifecycle

Status Meaning Transitions
draft Under active development. May change significantly. reviewed when open questions are resolved
reviewed Architecture is final. Implementation may begin. Changes require review. stable when implementation is complete
stable Locked. Changes require review and may warrant an ADR. deprecated when superseded
deprecated Superseded. Kept for reference. Removed when no longer referenced