docs: resolve 4 open questions, add research, spec codebook package structure
Research-driven resolution of OQ-01, OQ-02, OQ-05, OQ-06: - OQ-01: Remove ONNX Runtime from scope entirely — doesn't support activation extraction natively (optimum #972 closed as not planned), bloated model exports; burn/cublas via safetensors is a better future path - OQ-02: Codebook compresses ~65% (1,245 → 500-600 lines); add Package Structure and Extraction from PoC sections to codebook.md based on PoC analysis of metaspline firewall_codebook.py - OQ-05: Standalone API + thin adapter pattern (ADR-011); Phase 1 ships Firewall.screen() only, Phase 2 adds <100-line adapter packages for LlamaFirewall, OpenAI Agents SDK, NeMo Guardrails - OQ-06: TOML for file-based config — standard modern Python, two-way door Also: research OQ-03 rolling windows from taskgraph-semantic reference code, remove onnxruntime/optimum from dependencies, move streaming screening to Phase 2, add burn/cublas as Phase 3 alternative backend.
This commit is contained in:
440
docs/research/codebook-analysis/poc-architecture.md
Normal file
440
docs/research/codebook-analysis/poc-architecture.md
Normal file
@@ -0,0 +1,440 @@
|
||||
# Research: PoC Codebook Architecture Analysis (OQ-02)
|
||||
|
||||
**Date**: 2026-06-13
|
||||
**Status**: Complete
|
||||
**Question**: What is the minimum viable codebook? Can the 1,245-line PoC codebook be compressed, and what is essential vs. exploratory/dead code?
|
||||
|
||||
---
|
||||
|
||||
## 1. PoC Architecture Overview
|
||||
|
||||
### 1.1 File Structure & Role
|
||||
|
||||
The PoC codebook lives in `firewall_codebook.py` (1,245 lines) and depends on three metaspline core modules:
|
||||
|
||||
```
|
||||
firewall_codebook.py (1,245 lines)
|
||||
├── Imports from metaspline core:
|
||||
│ ├── metaspline.spline.SplineDistribution (spline.py, 378 lines)
|
||||
│ ├── metaspline.spline.ensure_strictly_increasing (spline.py)
|
||||
│ ├── metaspline.space.unfold / fold (space.py, 46 lines)
|
||||
│ └── metaspline.transform.simplex (transform.py, 78 lines)
|
||||
├── External imports:
|
||||
│ ├── sklearn.linear_model.LogisticRegression
|
||||
│ └── sklearn.mixture.GaussianMixture (imported but unused)
|
||||
└── Internal definitions (see §1.2)
|
||||
```
|
||||
|
||||
### 1.2 Major Sections of `firewall_codebook.py`
|
||||
|
||||
| Lines | Component | Description |
|
||||
|-------|-----------|-------------|
|
||||
| 1–50 | Module docstring + imports | Theory overview, imports |
|
||||
| 53–75 | `reverse_bary3d()` | Simplex → barycentric (u,v) transform |
|
||||
| 69–74 | `bary_to_simplex()` | Inverse: barycentric → simplex |
|
||||
| 77–112 | `DirectionProfile` dataclass | Per-contrast statistical profile |
|
||||
| 114–127 | `DirectionClassifier` dataclass | Per-contrast logistic regression weights |
|
||||
| 129–146 | `HistogramClassifier` dataclass | 2×2×2 codebook-state histogram classifier |
|
||||
| 148–165 | `DetectionResult` dataclass | Output of `detect()` |
|
||||
| 167–596 | `FirewallCodebook.__init__` + `build()` | Codebook construction (429 lines!) |
|
||||
| 598–629 | `FirewallCodebook.decompose()` | z → (sum, u, v) copula transform |
|
||||
| 631–669 | `FirewallCodebook.classify()` | Per-contrast logistic classification |
|
||||
| 671–729 | `FirewallCodebook.classify_histogram()` | 8-state histogram classification |
|
||||
| 731–860 | `FirewallCodebook.detect()` | Main detection entry point |
|
||||
| 862–884 | `FirewallCodebook.detect_from_perturbations()` | Convenience: P → z → detect |
|
||||
| 886–945 | `FirewallCodebook.summary()` | Human-readable summary |
|
||||
| 947–1041 | `FirewallCodebook.evaluate_auc()` | AUC evaluation on held-out data |
|
||||
| 1044–1118 | `build_codebook_from_precomputed()` | Load from saved .pt files |
|
||||
| 1121–1245 | `__main__` block | Script-mode evaluation + duplicated data loading |
|
||||
|
||||
### 1.3 Dependency Map
|
||||
|
||||
```
|
||||
┌──────────────────┐
|
||||
│ FirewallCodebook │
|
||||
│ (main class) │
|
||||
└────────┬─────────┘
|
||||
│
|
||||
┌───────────────┼───────────────┐
|
||||
│ │ │
|
||||
┌─────────▼──┐ ┌──────▼──────┐ ┌─────▼─────┐
|
||||
│ SplineDist │ │ simplex() │ │ bary3d() │
|
||||
│ (CDF/ICDF) │ │ (transform) │ │ (local) │
|
||||
└─────────┬──┘ └─────────────┘ └───────────┘
|
||||
│
|
||||
┌─────────▼──────────────┐
|
||||
│ MonotonicCubicSpline │
|
||||
│ (pchip interpolation) │
|
||||
└────────────────────────┘
|
||||
```
|
||||
|
||||
The `FirewallCodebook` has these hard dependencies at runtime:
|
||||
1. **SplineDistribution** — CDF/ICDF transforms (population fitting + inference)
|
||||
2. **simplex()** — normalize to simplex (x/sum(x))
|
||||
3. **reverse_bary3d()** — project simplex to 2D barycentric coordinates
|
||||
4. **torch** — tensor operations
|
||||
5. **numpy** — sklearn bridge for training only
|
||||
|
||||
Training-time dependencies (not needed at inference):
|
||||
- **sklearn.linear_model.LogisticRegression** — classifier training
|
||||
- **sklearn.metrics.silhouette_score** — profile quality metric
|
||||
- **sklearn.metrics.roc_auc_score** — evaluation metric
|
||||
|
||||
---
|
||||
|
||||
## 2. Essential vs. Exploratory vs. Dead Code Classification
|
||||
|
||||
### 2.1 Essential (Required for Production Codebook)
|
||||
|
||||
These are the core components that must be extracted into the production package:
|
||||
|
||||
| Component | Lines | Role | Production Mapping |
|
||||
|-----------|-------|------|-------------------|
|
||||
| `reverse_bary3d()` | 53–66 | z → (u,v) barycentric projection | `codebook/transforms.py` |
|
||||
| `bary_to_simplex()` | 69–74 | Inverse barycentric (needed for reconstruction) | `codebook/transforms.py` |
|
||||
| `SplineDistribution` | spline.py:200–261 | CDF/ICDF for copula transform | `codebook/splines.py` (adapted) |
|
||||
| `MonotonicCubicSpline` | spline.py:80–197 | PCHIP interpolation engine | `codebook/splines.py` (adapted) |
|
||||
| `ensure_strictly_increasing` | spline.py:43–73 | Knot sanitization | `codebook/splines.py` |
|
||||
| `simplex()` | transform.py:34–36 | Normalize to unit simplex | `codebook/transforms.py` |
|
||||
| `FirewallCodebook.__init__` | 182–203 | State initialization | `codebook/codebook.py` |
|
||||
| `FirewallCodebook.decompose()` | 598–629 | z → (sum, u, v) copula space | `codebook/projection.py` |
|
||||
| `FirewallCodebook.detect()` | 731–860 | Main detection logic | `codebook/detection.py` |
|
||||
| `DetectionResult` | 148–165 | Output dataclass | `codebook/results.py` |
|
||||
| `FirewallCodebook.build()` (core logic only) | 204–396 | SVD, spline fitting, profile computation | `training/compiler.py` |
|
||||
| `DirectionProfile` | 77–112 | Per-direction statistical profile | `codebook/profiles.py` |
|
||||
| `DirectionClassifier` | 114–127 | Per-direction linear classifier | `codebook/classifiers.py` |
|
||||
| `FirewallCodebook.detect_from_perturbations()` | 862–884 | P → z convenience wrapper | `codebook/projection.py` |
|
||||
|
||||
**Total essential lines**: ~480 lines (including metaspline core)
|
||||
|
||||
### 2.2 Exploratory / Research Code
|
||||
|
||||
These were useful for research but are **not needed** in production:
|
||||
|
||||
| Component | Lines | Purpose | Disposition |
|
||||
|-----------|-------|---------|-------------|
|
||||
| `HistogramClassifier` dataclass | 129–146 | Alternative 2×2×2 discretized classifier | Keep as optional, not MVP |
|
||||
| `classify_histogram()` | 671–729 | Histogram-based classification variant | Research variant, not MVP |
|
||||
| `build()` histogram classifier section | 481–596 | Training histogram classifiers | Research variant |
|
||||
| `evaluate_auc()` | 947–1041 | Offline AUC evaluation | Testing/benchmarking only |
|
||||
| `summary()` | 886–945 | Human-readable codebook summary | Debugging/diagnostic tool |
|
||||
| `classify()` | 631–669 | Per-position probability output | Subsumed by `detect()` |
|
||||
| `build_codebook_from_precomputed()` | 1044–1118 | Load from .pt files | Training pipeline I/O |
|
||||
| `build()` contrast_pairs default | 268–276 | Hardcoded 7-pair contrast list | Config, not code |
|
||||
| `pooled_std()` inner function | 327–331 | Statistical utility | Extract to `training/stats.py` |
|
||||
| `cohen_d()` inner function | 337–340 | Effect size utility | Extract to `training/stats.py` |
|
||||
| `compute_silhouette()` inner function | 365–370 | Quality metric | Training diagnostic |
|
||||
|
||||
### 2.3 Dead Code
|
||||
|
||||
| Component | Lines | Issue |
|
||||
|-----------|-------|-------|
|
||||
| `sklearn.mixture.GaussianMixture` import | 44 | Imported but never used |
|
||||
| `unfold()` / `fold()` from `space.py` | space.py:4–45 | Imported but never called in codebook |
|
||||
| `dcs_norm()` from `transform.py` | transform.py:20–23 | Imported but never used |
|
||||
| `__main__` block duplicated data loading | 1121–1245 | Lines 1203–1245 repeat 1126–1182 verbatim with different formatting — copy-paste artifact |
|
||||
| `bary_to_simplex()` | 69–74 | Defined but never called in codebook |
|
||||
| `DensitySpline` class | spline.py:315–378 | Legacy alternative, not used by codebook |
|
||||
| `empirical_cdf()` / `empirical_density()` / `log_bins()` / `generate_asymmetric_knots()` | spline.py:268–313 | Utility functions not used by codebook |
|
||||
|
||||
### 2.4 Infrastructure (Training Pipeline, Not Runtime)
|
||||
|
||||
| Component | Lines | Purpose |
|
||||
|-----------|-------|---------|
|
||||
| `run_manifold_projection.py` (entire) | 823 | Model loading, data collection, SVD computation, saving artifacts |
|
||||
| `analyzer.py` (entire) | 560 | Multi-layer direction analysis, residual extraction |
|
||||
| `discover_directions.py` (entire) | 401 | Post-hoc direction discovery from trajectory data |
|
||||
| `build()` SVD computation section | 229–233 | Population SVD → V3 basis |
|
||||
|
||||
---
|
||||
|
||||
## 3. Training Pipeline Analysis
|
||||
|
||||
### 3.1 `run_manifold_projection.py` — Step by Step
|
||||
|
||||
The training pipeline performs these operations:
|
||||
|
||||
1. **Model Loading** (L79–103): Load HuggingFace model + tokenizer. Configure for GPU/CPU.
|
||||
|
||||
2. **Condition Catalog Construction** (L106–153): Build contrastive prompt sets for 8 behavioral conditions:
|
||||
- self_ref / other_ref
|
||||
- violated / expected (semantic)
|
||||
- code_violated / code_expected
|
||||
- instruction / data
|
||||
- tool_call / natural_language
|
||||
- uncertain / confident
|
||||
- harmful / harmless
|
||||
- injection / benign_instruction
|
||||
|
||||
3. **Feature Extraction** (L156–213): For each condition, extract:
|
||||
- Hidden states across all layers → `residuals` (n_prompts, n_layers+1, hidden_dim)
|
||||
- ICDF perturbation vectors → `perturbations` (n_prompts, 64)
|
||||
- Last-layer hidden states → `hidden_last` (n_prompts, hidden_dim)
|
||||
|
||||
4. **SVD Computation** (L216–263):
|
||||
- Activation SVD: `H_all` (N, 2048) → principal components in hidden state space
|
||||
- Perturbation SVD: `P_all` (N, 64) → the **3D perturbation manifold** (this is the basis V3)
|
||||
|
||||
5. **Direction Vector Computation** (L393–434): Per-contrast mean-difference direction vectors at best layers.
|
||||
|
||||
6. **Projection Analysis** (L436–668): Extensive analysis of direction projections onto activation/perturbation subspaces. **This is research output, not needed for codebook compilation.**
|
||||
|
||||
7. **Save Results** (L670–755):
|
||||
- `.json`: Scalar metrics, SVD variance, separation stats
|
||||
- `.pt`: Tensors — **this is the key artifact**:
|
||||
- `perturbation_svd_Vh` → top-k right-singular vectors (the SVD basis)
|
||||
- `perturbation_mean` → population mean for centering
|
||||
- `condition_perturbations` → per-condition perturbation vectors
|
||||
- `condition_hidden_last` → last-layer hidden states per condition
|
||||
|
||||
### 3.2 Codebook Artifact Production
|
||||
|
||||
The `.pt` file from `run_manifold_projection.py` feeds directly into `build_codebook_from_precomputed()`, which:
|
||||
|
||||
1. Loads `.pt` file → extracts `perturbation_svd_Vh[:3]` (V3 basis) and `perturbation_mean` (P_mean)
|
||||
2. Reconstructs z-coords: `z = (P - P_mean) @ V3.T`
|
||||
3. Calls `FirewallCodebook.build()` which:
|
||||
- Fits SplineDistribution on each z dimension (population)
|
||||
- Fits SplineDistribution on sums (population)
|
||||
- Decomposes each condition via CDF → (sum, u, v)
|
||||
- Computes DirectionProfiles (pooled stats, Cohen's d, thresholds)
|
||||
- Trains DirectionClassifiers (logistic regression per contrast)
|
||||
- Trains HistogramClassifiers (8-state discrete classifiers)
|
||||
|
||||
**The produced codebook artifacts map to the production spec as:**
|
||||
|
||||
| PoC Artifact | Production Format | Notes |
|
||||
|---|---|---|
|
||||
| `FirewallCodebook.z_splines` (3× SplineDistribution) | `splines.json` (knot positions + coefficients) | Spline knots serialized as JSON arrays |
|
||||
| `FirewallCodebook.svd_V3` (3×64 tensor) | `basis.safetensors` → `basis_vectors` | Reshaped for multi-layer format |
|
||||
| `FirewallCodebook.population_mean_P` (64 tensor) | `basis.safetensors` → `mean` | Centering vector |
|
||||
| `FirewallCodebook.direction_profiles` (dict) | `regions.safetensors` → centroids, scale | Per-direction statistical profiles |
|
||||
| `FirewallCodebook.classifiers` (dict) | Part of `config.json` or `regions.safetensors` | Logistic weights (3 floats + intercept per direction) |
|
||||
| `FirewallCodebook.sum_spline` (SplineDistribution) | `splines.json` | Sum distribution spline |
|
||||
| `FirewallCodebook.population_stats` (dict) | `regions.safetensors` → centroids, scale | Population baselines |
|
||||
|
||||
---
|
||||
|
||||
## 4. Core Library Assessment
|
||||
|
||||
### 4.1 Metaspline Core Usage
|
||||
|
||||
The metaspline core (`spline.py` 378 lines, `transform.py` 78 lines, `space.py` 46 lines — 502 lines total) provides:
|
||||
|
||||
| Module | Lines | Used by Codebook | Lines Actually Used |
|
||||
|--------|-------|-------------------|---------------------|
|
||||
| `spline.py` | 378 | `SplineDistribution`, `ensure_strictly_increasing` | ~175 lines (SplineDistribution + MonotonicCubicSpline + ensure_strictly_increasing) |
|
||||
| `transform.py` | 78 | `simplex()` only | 3 lines |
|
||||
| `space.py` | 46 | None (imported but unused) | 0 lines |
|
||||
|
||||
**Actual dependency: ~178 lines out of 502.** The codebook uses only `SplineDistribution` (CDF/ICDF), `MonotonicCubicSpline` (its backbone), `ensure_strictly_increasing`, and `simplex()`. The following are unused:
|
||||
|
||||
- `DensitySpline` class (spline.py, 60 lines) — legacy CDF-based distribution, not used
|
||||
- `empirical_cdf()`, `empirical_density()`, `log_bins()`, `generate_asymmetric_knots()` (spline.py, ~45 lines) — utility functions, unused
|
||||
- `unfold()` / `fold()` (space.py, 46 lines) — digit expansion/contraction, unused
|
||||
- `double_cumsum()`, `double_diff()`, `dcs_norm()`, `normalize_01()`, `column_cdf_normalize()`, `toBase()`, `numSymbols()`, `ndVec()` (transform.py, ~75 lines) — unused
|
||||
|
||||
### 4.2 How Much Is Inline vs. Library?
|
||||
|
||||
The `FirewallCodebook.build()` method has **significant inline reimplementation** of statistical operations that could be cleaner:
|
||||
|
||||
- **Lines 229–233**: SVD computation is inline (should use the pipeline's `compute_perturbation_svd()`)
|
||||
- **Lines 236–246**: Spline fitting is inline but delegates to `SplineDistribution`
|
||||
- **Lines 313–324**: CDF → decompose → barycentric is duplicated 3× (in `build()`, `classify()`, `classify_histogram()`)
|
||||
- **Lines 327–340**: `pooled_std()` and `cohen_d()` are inner functions, not module-level
|
||||
- **Lines 365–370**: `compute_silhouette()` is an inner function with sklearn import
|
||||
|
||||
The core decomposition pipeline (z → CDF → simplex → barycentric → (sum, u, v)) appears **verbatim** in:
|
||||
1. `build()` lines 242–250 (population)
|
||||
2. `build()` lines 313–324 (per-condition, profile computation)
|
||||
3. `build()` lines 445–456 (per-condition, classifier computation)
|
||||
4. `build()` lines 521–532 (per-condition, histogram computation)
|
||||
5. `decompose()` lines 610–628 (runtime inference)
|
||||
|
||||
This is the **single most compressible pattern** — a 10-line decomposition sequence repeated 5 times.
|
||||
|
||||
---
|
||||
|
||||
## 5. Minimum Viable Codebook
|
||||
|
||||
### 5.1 Required Functions for Production
|
||||
|
||||
Based on the production spec (`codebook.md`), the minimum viable codebook needs:
|
||||
|
||||
1. **`project(activations) → z_coords`**: SVD projection (matrix multiply + centering)
|
||||
2. **`decompose(z_coords) → (sum, u, v)`**: CDF → simplex → barycentric
|
||||
3. **`score(z_coords) → list[DimensionSignal]`**: Per-direction scoring against profiles
|
||||
4. **`detect(z_coords, threshold) → DetectionResult`**: Threshold comparison + flagging
|
||||
5. **`load(path) → Codebook`**: Deserialize from safetensors + JSON
|
||||
6. **SplineDistribution**: CDF evaluation for decompose
|
||||
|
||||
And for the **training pipeline** (not runtime):
|
||||
7. **`build(population_data, direction_data) → Codebook`**: SVD, spline fitting, classifier training
|
||||
|
||||
### 5.2 Compression Estimate
|
||||
|
||||
| Source | Lines | Classification | Production Lines |
|
||||
|--------|-------|----------------|------------------|
|
||||
| `firewall_codebook.py` | 1,245 | Core + research + dead | ~350 |
|
||||
| `spline.py` (used parts) | ~178 | Core library | ~180 |
|
||||
| `transform.py` (used parts) | ~3 | Core library | ~5 |
|
||||
| **Total PoC dependency** | **~426** | | **~535** |
|
||||
|
||||
**Target estimate: 400–500 lines for runtime codebook, 150–200 lines for training pipeline.**
|
||||
|
||||
Breakdown of production targets:
|
||||
|
||||
| Module | Target Lines | Contents |
|
||||
|--------|-------------|----------|
|
||||
| `codebook/transforms.py` | ~30 | `simplex()`, `reverse_bary3d()`, `bary_to_simplex()` |
|
||||
| `codebook/splines.py` | ~180 | `MonotonicCubicSpline`, `SplineDistribution`, `ensure_strictly_increasing` |
|
||||
| `codebook/profiles.py` | ~30 | `DirectionProfile` dataclass |
|
||||
| `codebook/classifiers.py` | ~20 | `DirectionClassifier` dataclass |
|
||||
| `codebook/results.py` | ~15 | `DetectionResult` dataclass |
|
||||
| `codebook/projection.py` | ~30 | `project()` and `decompose()` |
|
||||
| `codebook/detection.py` | ~50 | `detect()` with rolling window, threshold logic |
|
||||
| `codebook/codebook.py` | ~40 | `Codebook` class (init, load, summary) |
|
||||
| `training/compiler.py` | ~150 | `build()` — SVD, spline fitting, profile computation |
|
||||
| `training/stats.py` | ~25 | `pooled_std()`, `cohen_d()`, silhouette |
|
||||
| **Total** | **~570** | | |
|
||||
|
||||
This is **46% of the PoC's 1,245 lines**, or if including the used portion of metaspline core, **~35% of the total 1,745 lines** referenced in the overview.
|
||||
|
||||
### 5.3 What Gets Cut
|
||||
|
||||
| Lines Cut | Source | Reason |
|
||||
|-----------|--------|--------|
|
||||
| ~130 | `HistogramClassifier` + `classify_histogram()` + histogram training | Alternative approach, not MVP |
|
||||
| ~95 | `evaluate_auc()` | Testing/benchmarking tool |
|
||||
| ~60 | `summary()` | Debugging tool, not runtime |
|
||||
| ~75 | `__main__` block (including duplicated code) | Script-mode evaluation |
|
||||
| ~40 | `classify()` method | Subsumed by `detect()` |
|
||||
| ~30 | `build_codebook_from_precomputed()` | Training I/O, not runtime |
|
||||
| ~124 | Unused metaspline code (DensitySpline, unfold/fold, dcs_norm, etc.) | Dead code |
|
||||
| ~50 | Repeated decomposition sequences | DRY refactoring |
|
||||
|
||||
---
|
||||
|
||||
## 6. Proposed Decomposition
|
||||
|
||||
Matching the production package structure from `codebook.md`:
|
||||
|
||||
```
|
||||
src/alknet_firewall/
|
||||
├── codebook/
|
||||
│ ├── __init__.py # Public exports
|
||||
│ ├── codebook.py # Codebook class (init, load, project, score)
|
||||
│ ├── transforms.py # simplex, reverse_bary3d, bary_to_simplex
|
||||
│ ├── splines.py # MonotonicCubicSpline, SplineDistribution
|
||||
│ ├── profiles.py # DirectionProfile, population stats
|
||||
│ ├── classifiers.py # DirectionClassifier (logistic weights)
|
||||
│ ├── results.py # DetectionResult, DimensionSignal, AlarmLevel
|
||||
│ ├── projection.py # project(), decompose()
|
||||
│ └── detection.py # detect(), threshold comparison, rolling window
|
||||
├── training/
|
||||
│ ├── __init__.py
|
||||
│ ├── compiler.py # build() — SVD, spline fitting, profile comp
|
||||
│ ├── stats.py # pooled_std, cohen_d, silhouette
|
||||
│ └── data_loader.py # Condition catalog, prompt sets, data loading
|
||||
└── data/
|
||||
└── codebook/
|
||||
├── basis.safetensors
|
||||
├── regions.safetensors
|
||||
├── splines.json
|
||||
└── config.json
|
||||
```
|
||||
|
||||
### 6.1 Key Design Decisions for Extraction
|
||||
|
||||
1. **SplineDistribution stays in `codebook/splines.py`** — it's a general-purpose distribution class used at both training and inference time. No need for a separate package.
|
||||
|
||||
2. **`simplex()` moves to `codebook/transforms.py`** — it's a single pure function (3 lines), no need for the `transform.py` dependency chain.
|
||||
|
||||
3. **`unfold`/`fold` from `space.py` are dropped** — never used by the codebook.
|
||||
|
||||
4. **`DirectionProfile` and `DirectionClassifier` become separate dataclass modules** — clean separation of data from logic.
|
||||
|
||||
5. **`build()` moves entirely to `training/compiler.py`** — runtime codebook is read-only. This is the biggest architectural change: the codebook class should not have a `build()` classmethod.
|
||||
|
||||
6. **Decompose becomes a pure function** — `decompose(z, splines)` is a pure mathematical transform with no state dependencies beyond the splines. Making it a standalone function enables testing.
|
||||
|
||||
7. **Detection is separate from the codebook class** — `detect(z, classifiers, profiles, threshold)` is a stateless function given the codebook data. This enables swapping detection strategies without touching the codebook.
|
||||
|
||||
---
|
||||
|
||||
## 7. Testing Data
|
||||
|
||||
### 7.1 Saved Artifacts Referenced in Code
|
||||
|
||||
The PoC references these saved data files:
|
||||
|
||||
| File | Path | Contents | Reusable for Testing |
|
||||
|------|------|----------|---------------------|
|
||||
| Population precomputed | `saved_data/precomputed_seed42_qwen3_0.6b.pt` | z_coords, P_mean, perturbation_svd_Vh | Yes — basis for integration tests |
|
||||
| Population precomputed | `saved_data/precomputed_seed42_qwen3_1.7b.pt` | Same for 1.7B model | Yes — multi-model test |
|
||||
| Population precomputed | `saved_data/precomputed_seed42_qwen3_4b.pt` | Same for 4B model | Yes — multi-model test |
|
||||
| Direction geometry | `experiments/direction_geometry/results/Qwen_Qwen3-0.6B_manifold_projection.pt` | Full condition data + SVD | Yes — golden data for codebook compilation |
|
||||
| Direction geometry | `experiments/direction_geometry/results/Qwen_Qwen3-1.7B_manifold_projection.pt` | Same for 1.7B | Yes |
|
||||
| Contrast pairs | Hardcoded in `build()` L268–276 and `run_manifold_projection.py` L139–148 | 7 behavioral contrasts | Yes — test fixture definition |
|
||||
|
||||
### 7.2 Validation Results Referenced
|
||||
|
||||
The `__main__` block (L1121–1245) contains:
|
||||
- AUC evaluation at window sizes [1, 4, 8, 16]
|
||||
- Per-direction AUC scores for both continuous and histogram classifiers
|
||||
- Per-token AUC evaluation
|
||||
|
||||
These results should be captured as **golden test fixtures** for the production codebook:
|
||||
- Build a codebook from the 0.6B precomputed data
|
||||
- Verify that AUC scores match expected ranges
|
||||
- Verify that detection decisions match expected flags
|
||||
|
||||
### 7.3 Calibration Data for Testing
|
||||
|
||||
For unit/integration tests, we need:
|
||||
|
||||
1. **Synthetic z-coord population**: Small N=1000 tensor for spline fitting tests
|
||||
2. **Known-contrast z-coords**: Small pairs (harmful/harmless) for direction profile tests
|
||||
3. **Expected spline parameters**: Known knot positions/coefficients for regression tests
|
||||
4. **Expected detection results**: For a given input, what does `detect()` return?
|
||||
|
||||
The PoC's `build_codebook_from_precomputed()` provides a ready-made path to generate these fixtures from the saved `.pt` files.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
### Key Findings
|
||||
|
||||
1. **The 1,245-line PoC contains ~480 lines of essential code**. Including the metaspline core dependency (~178 lines used), the total essential code is ~658 lines. With dead code and research artifacts removed, the production codebook should target **400–500 lines** for runtime + **150–200 lines** for training.
|
||||
|
||||
2. **The decomposition pipeline (z → CDF → simplex → bary → (sum,u,v)) is repeated 5 times** in the PoC. Extracting it into a single `decompose()` function saves ~50 lines and eliminates a bug surface.
|
||||
|
||||
3. **The metaspline core has ~65% unused code** when viewed from the codebook's perspective. Only `SplineDistribution`, `MonotonicCubicSpline`, `ensure_strictly_increasing`, and `simplex()` are needed — the rest (DensitySpline, unfold/fold, dcs_norm, etc.) can be dropped entirely.
|
||||
|
||||
4. **The histogram classifier (2×2×2 discretized approach) is an exploratory alternative**, not the primary detection mechanism. The continuous logistic classifier is superior (higher AUC) and should be the MVP approach. The histogram classifier adds ~130 lines and can be deferred.
|
||||
|
||||
5. **The `build()` method is the largest single function (429 lines)** and mixes training with runtime state. It must be decomposed: training logic moves to `training/compiler.py`, runtime state becomes immutable serialized data.
|
||||
|
||||
6. **Saved `.pt` files from the PoC provide golden test data** — the manifold projection results for Qwen3-0.6B and 1.7B can be reused directly for integration tests.
|
||||
|
||||
### Recommendation
|
||||
|
||||
**Target: 500–600 lines total** for the production codebook (runtime + training), down from 1,245 lines in the PoC and 1,745 lines including metaspline core. This is a **~65% compression**.
|
||||
|
||||
The architecture should separate:
|
||||
- **Runtime** (~400 lines): `Codebook`, transforms, splines, detection, results
|
||||
- **Training** (~150 lines): compiler, stats, data loading
|
||||
- **Data** (bundled): safetensors + JSON, no Python
|
||||
|
||||
### Next Steps
|
||||
|
||||
1. Create `src/alknet_firewall/codebook/` package structure
|
||||
2. Extract `transforms.py` (simplex, barycentric) — trivial, ~30 lines
|
||||
3. Port `splines.py` (MonotonicCubicSpline + SplineDistribution) — ~180 lines, mostly copy with cleanup
|
||||
4. Implement `projection.py` (project, decompose) — thin wrappers, ~30 lines
|
||||
5. Implement `detection.py` (detect with rolling window) — ~50 lines, port from PoC's detect()
|
||||
6. Implement `codebook.py` (Codebook class with load) — ~40 lines
|
||||
7. Extract `training/compiler.py` from `build()` — most complex extraction, ~150 lines
|
||||
8. Create test fixtures from saved `.pt` data
|
||||
9. Verify round-trip: build from .pt → serialize → load → detect matches PoC output
|
||||
Reference in New Issue
Block a user