docs: add copula decomposition pipeline, clarify detection data flow

The architecture specs previously described detection as a single-vector path (one activation → one z-coordinate → one alarm), but the PoC operates on per-token z-coordinate sequences with a two-stage copula decomposition. Key updates: - codebook.md: Add Copula Decomposition section (z → CDF → simplex → barycentric → (S, u, v)), Direction Profiles and Contrast Pairs section, Token-Level Smoothing section, classifier weights and direction profiles to data format, updated Internal API with decompose/classify/detect methods - codebook.md: Clarify z-coordinate shapes — training is (N, 3) flattened per-token positions, inference is (seq_len, 3) per-token sequence - firewall.md: Update data flow to 10-step pipeline including copula decomposition, smoothing, and direction classification; update score composition to use direction-level P(active); update DimensionSignal dataclass; update latency budget with copula/smoothing/classification steps - model.md: Add Phase 1 (last-token) vs Phase 2 (per-token) extraction modes - ADR-009: Note last-token is Phase 1 simplification, per-token is full pipeline
2026-06-13 08:17:09 +00:00
parent 7d8a39a88a
commit 45a0e0798c
4 changed files with 300 additions and 72 deletions
--- a/docs/architecture/decisions/009-last-token-extraction.md
+++ b/docs/architecture/decisions/009-last-token-extraction.md
@@ -30,8 +30,14 @@ input.

 ## Decision

-Extract the last token's hidden state at each configured layer. This is
-standard for LLaMA-family models and provides full-sequence context.
+Extract the last token's hidden state at each configured layer as the Phase 1
+default. This is standard for LLaMA-family models and provides full-sequence
+context.
+
+Phase 2 extends this to per-token extraction (hidden states at every position)
+to enable token-level smoothing and per-position behavioral classification.
+The training pipeline already uses per-token extraction for calibration data
+collection.

 ## Consequences

@@ -40,6 +46,7 @@ standard for LLaMA-family models and provides full-sequence context.
 - Full sequence context via causal attention
 - Single vector per layer — simple to project and score
 - No padding sensitivity (unlike mean pooling with attention masks)
+- Phase 1 simplification: reduces implementation complexity and latency

 **Negative**:
 - Position-dependent — the last token's representation is influenced by its
@@ -48,6 +55,8 @@ standard for LLaMA-family models and provides full-sequence context.
  activation patterns
 - May miss patterns in long inputs where the adversarial payload is in the
  middle rather than the end
+- Phase 1 only: misses token-level behavioral signals that require per-token
+  extraction (addressed in Phase 2)

 ## References