docs: add copula decomposition pipeline, clarify detection data flow

The architecture specs previously described detection as a single-vector
path (one activation → one z-coordinate → one alarm), but the PoC operates
on per-token z-coordinate sequences with a two-stage copula decomposition.

Key updates:
- codebook.md: Add Copula Decomposition section (z → CDF → simplex →
  barycentric → (S, u, v)), Direction Profiles and Contrast Pairs section,
  Token-Level Smoothing section, classifier weights and direction profiles
  to data format, updated Internal API with decompose/classify/detect methods
- codebook.md: Clarify z-coordinate shapes — training is (N, 3) flattened
  per-token positions, inference is (seq_len, 3) per-token sequence
- firewall.md: Update data flow to 10-step pipeline including copula
  decomposition, smoothing, and direction classification; update score
  composition to use direction-level P(active); update DimensionSignal
  dataclass; update latency budget with copula/smoothing/classification steps
- model.md: Add Phase 1 (last-token) vs Phase 2 (per-token) extraction modes
- ADR-009: Note last-token is Phase 1 simplification, per-token is full
  pipeline
This commit is contained in:
2026-06-13 08:17:09 +00:00
parent 7d8a39a88a
commit 45a0e0798c
4 changed files with 300 additions and 72 deletions

View File

@@ -30,8 +30,14 @@ input.
## Decision
Extract the last token's hidden state at each configured layer. This is
standard for LLaMA-family models and provides full-sequence context.
Extract the last token's hidden state at each configured layer as the Phase 1
default. This is standard for LLaMA-family models and provides full-sequence
context.
Phase 2 extends this to per-token extraction (hidden states at every position)
to enable token-level smoothing and per-position behavioral classification.
The training pipeline already uses per-token extraction for calibration data
collection.
## Consequences
@@ -40,6 +46,7 @@ standard for LLaMA-family models and provides full-sequence context.
- Full sequence context via causal attention
- Single vector per layer — simple to project and score
- No padding sensitivity (unlike mean pooling with attention masks)
- Phase 1 simplification: reduces implementation complexity and latency
**Negative**:
- Position-dependent — the last token's representation is influenced by its
@@ -48,6 +55,8 @@ standard for LLaMA-family models and provides full-sequence context.
activation patterns
- May miss patterns in long inputs where the adversarial payload is in the
middle rather than the end
- Phase 1 only: misses token-level behavioral signals that require per-token
extraction (addressed in Phase 2)
## References