alknet-firewall

Author	SHA1	Message	Date
glm-5.1	c225cf420c	docs: resolve OQ-03 — adopt rolling token window screening (ADR-012) Research confirmed rolling token windows as the right approach for long document screening. ADR-012 formalizes the decision: Phase 2 implements screen_document() with 25% overlap (512 tokens for SmolLM2-135M), max pooling aggregation, and character offset tracking. Short inputs fall through to screen() unchanged. This resolves the last open question. All 6 original OQs are now resolved: - OQ-01: ONNX removed (burn/cublas better future path) - OQ-02: 65% codebook compression achievable - OQ-03: Rolling token windows for Phase 2 (ADR-012) - OQ-04: Both model-specific defaults + user-overridable - OQ-05: Standalone API + thin adapters (ADR-011) - OQ-06: TOML for file-based config	2026-06-13 08:25:12 +00:00
glm-5.1	45a0e0798c	docs: add copula decomposition pipeline, clarify detection data flow The architecture specs previously described detection as a single-vector path (one activation → one z-coordinate → one alarm), but the PoC operates on per-token z-coordinate sequences with a two-stage copula decomposition. Key updates: - codebook.md: Add Copula Decomposition section (z → CDF → simplex → barycentric → (S, u, v)), Direction Profiles and Contrast Pairs section, Token-Level Smoothing section, classifier weights and direction profiles to data format, updated Internal API with decompose/classify/detect methods - codebook.md: Clarify z-coordinate shapes — training is (N, 3) flattened per-token positions, inference is (seq_len, 3) per-token sequence - firewall.md: Update data flow to 10-step pipeline including copula decomposition, smoothing, and direction classification; update score composition to use direction-level P(active); update DimensionSignal dataclass; update latency budget with copula/smoothing/classification steps - model.md: Add Phase 1 (last-token) vs Phase 2 (per-token) extraction modes - ADR-009: Note last-token is Phase 1 simplification, per-token is full pipeline	2026-06-13 08:17:09 +00:00
glm-5.1	7d8a39a88a	docs: resolve 4 open questions, add research, spec codebook package structure Research-driven resolution of OQ-01, OQ-02, OQ-05, OQ-06: - OQ-01: Remove ONNX Runtime from scope entirely — doesn't support activation extraction natively (optimum #972 closed as not planned), bloated model exports; burn/cublas via safetensors is a better future path - OQ-02: Codebook compresses ~65% (1,245 → 500-600 lines); add Package Structure and Extraction from PoC sections to codebook.md based on PoC analysis of metaspline firewall_codebook.py - OQ-05: Standalone API + thin adapter pattern (ADR-011); Phase 1 ships Firewall.screen() only, Phase 2 adds <100-line adapter packages for LlamaFirewall, OpenAI Agents SDK, NeMo Guardrails - OQ-06: TOML for file-based config — standard modern Python, two-way door Also: research OQ-03 rolling windows from taskgraph-semantic reference code, remove onnxruntime/optimum from dependencies, move streaming screening to Phase 2, add burn/cublas as Phase 3 alternative backend.	2026-06-13 07:27:40 +00:00
glm-5.1	11620e8398	docs: resolve OQ-04, remove OQ-07, enrich OQ-03 with rolling windows - OQ-04 resolved: thresholds are both model-specific (shipped with codebook) and user-overridable. Inspired by platonic representation hypothesis — calibrated models converge on similar behavioral patterns. - OQ-07 removed: Rust port is an alknet project concern, not relevant to the Python package architecture. Removed from overview.md Phase 3. - OQ-03 enriched: rolling window token screening for granular detection in documents (PDF→markdown use case, academic paper injection detection). Upgraded from low to medium priority. - OQ-01 updated: likely path is PyTorch first, ONNX export by default. - OQ-05 updated: needs deep dive into guardrail landscape. - Updated threshold description in configuration.md with platonic representation context.	2026-06-13 05:47:44 +00:00
glm-5.1	cf464c2296	feat: initial architecture specification and research Phase 0→1 setup for alknet-firewall — a behavioral signal detection library that screens untrusted LLM inputs using small model activations. Architecture docs (5 specs, 10 ADRs, 7 open questions): - overview: vision, scope, dependencies, package structure - firewall: core API, alarm protocol, score composition, error handling - codebook: SVD basis, spline distributions, calibration, tensor format - model: activation extraction, model-agnostic interface, lazy loading - configuration: thresholds, model selection, detection tuning Research reports: - modern-python-project-setup: uv, pyproject.toml, src layout, ruff, CI - python-ml-packaging: optional PyTorch, HF Hub download, safetensors - llm-input-safety-landscape: threat taxonomy, defenses, academic evidence Agent role adaptations for Python project (replaced Rust conventions).	2026-06-13 05:17:40 +00:00

5 Commits