docs: add copula decomposition pipeline, clarify detection data flow
The architecture specs previously described detection as a single-vector path (one activation → one z-coordinate → one alarm), but the PoC operates on per-token z-coordinate sequences with a two-stage copula decomposition. Key updates: - codebook.md: Add Copula Decomposition section (z → CDF → simplex → barycentric → (S, u, v)), Direction Profiles and Contrast Pairs section, Token-Level Smoothing section, classifier weights and direction profiles to data format, updated Internal API with decompose/classify/detect methods - codebook.md: Clarify z-coordinate shapes — training is (N, 3) flattened per-token positions, inference is (seq_len, 3) per-token sequence - firewall.md: Update data flow to 10-step pipeline including copula decomposition, smoothing, and direction classification; update score composition to use direction-level P(active); update DimensionSignal dataclass; update latency budget with copula/smoothing/classification steps - model.md: Add Phase 1 (last-token) vs Phase 2 (per-token) extraction modes - ADR-009: Note last-token is Phase 1 simplification, per-token is full pipeline
This commit is contained in:
@@ -30,8 +30,14 @@ input.
|
||||
|
||||
## Decision
|
||||
|
||||
Extract the last token's hidden state at each configured layer. This is
|
||||
standard for LLaMA-family models and provides full-sequence context.
|
||||
Extract the last token's hidden state at each configured layer as the Phase 1
|
||||
default. This is standard for LLaMA-family models and provides full-sequence
|
||||
context.
|
||||
|
||||
Phase 2 extends this to per-token extraction (hidden states at every position)
|
||||
to enable token-level smoothing and per-position behavioral classification.
|
||||
The training pipeline already uses per-token extraction for calibration data
|
||||
collection.
|
||||
|
||||
## Consequences
|
||||
|
||||
@@ -40,6 +46,7 @@ standard for LLaMA-family models and provides full-sequence context.
|
||||
- Full sequence context via causal attention
|
||||
- Single vector per layer — simple to project and score
|
||||
- No padding sensitivity (unlike mean pooling with attention masks)
|
||||
- Phase 1 simplification: reduces implementation complexity and latency
|
||||
|
||||
**Negative**:
|
||||
- Position-dependent — the last token's representation is influenced by its
|
||||
@@ -48,6 +55,8 @@ standard for LLaMA-family models and provides full-sequence context.
|
||||
activation patterns
|
||||
- May miss patterns in long inputs where the adversarial payload is in the
|
||||
middle rather than the end
|
||||
- Phase 1 only: misses token-level behavioral signals that require per-token
|
||||
extraction (addressed in Phase 2)
|
||||
|
||||
## References
|
||||
|
||||
|
||||
Reference in New Issue
Block a user