Files
alknet-firewall/docs/architecture/open-questions.md
glm-5.1 11620e8398 docs: resolve OQ-04, remove OQ-07, enrich OQ-03 with rolling windows
- OQ-04 resolved: thresholds are both model-specific (shipped with
  codebook) and user-overridable. Inspired by platonic representation
  hypothesis — calibrated models converge on similar behavioral patterns.
- OQ-07 removed: Rust port is an alknet project concern, not relevant
  to the Python package architecture. Removed from overview.md Phase 3.
- OQ-03 enriched: rolling window token screening for granular detection
  in documents (PDF→markdown use case, academic paper injection detection).
  Upgraded from low to medium priority.
- OQ-01 updated: likely path is PyTorch first, ONNX export by default.
- OQ-05 updated: needs deep dive into guardrail landscape.
- Updated threshold description in configuration.md with platonic
  representation context.
2026-06-13 05:47:44 +00:00

144 lines
5.9 KiB
Markdown

# Open Questions
Centralized tracker for unresolved questions across all architecture documents.
## Theme: Inference Backend
### OQ-01: Should ONNX Runtime be a supported inference backend in Phase 1?
- **Origin**: [model.md](model.md), [overview.md](overview.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (pending — needs research into ONNX export path)
- **Cross-references**: ADR-006
ONNX Runtime provides a much smaller install footprint (~30-50MB vs 200MB-2.5GB
for PyTorch) and is well-suited for inference-only use. HuggingFace's `optimum`
library provides drop-in replacement classes. However, supporting it in Phase 1
adds complexity: model must be exported to ONNX format, `optimum` integration
must be tested, and the activation extraction API may differ from PyTorch.
The likely path is: build with PyTorch first, then export to ONNX by default.
This needs research to confirm the activation extraction API compatibility and
ONNX export quality for SmolLM2-135M. Leave open for now.
---
## Theme: Codebook Design
### OQ-02: What is the minimum viable codebook — can the 1,245-line PoC codebook be compressed?
- **Origin**: [codebook.md](codebook.md)
- **Status**: open
- **Priority**: high
- **Resolution**: (pending — dedicated research session needed)
- **Cross-references**: ADR-004
The PoC codebook is 1,245 lines — much of it may be boilerplate, dead code,
or excessive parameterization from the research phase. Understanding what's
essential vs. exploratory is critical for the initial extraction. The codebook
training pipeline (`run_manifold_projection.py`) should also be analyzed.
Consider: How many SVD dimensions are actually needed? What's the minimum
calibration dataset? Can spline distributions be simplified? This needs a
dedicated session to analyze the PoC codebase.
---
## Theme: API Design
### OQ-03: Should the firewall support streaming/chunked input screening?
- **Origin**: [firewall.md](firewall.md)
- **Status**: open
- **Priority**: medium
- **Cross-references**: ADR-003, OQ-05
Some inputs arrive in chunks (streaming API responses, large documents). Should
the firewall support incremental screening as chunks arrive, or require the
full input before screening? Incremental screening could detect attacks earlier
but requires buffering and state management.
**Rolling window approach**: One promising direction is rolling windows of
tokens — chunking large text into overlapping windows and screening each
window independently. This enables:
1. **Granular detection**: For the instruction firewall use case (screening
academic papers converted from PDF to markdown), rolling windows can
red-flag specific *sections* of a document rather than the whole thing.
This is directly useful for catching hidden prompt injections in academic
research papers (~20 real examples found of researchers slipping injections
past peer review).
2. **Parallel processing**: Windows can be screened in parallel, enabling
throughput scaling.
3. **Large input handling**: No need to truncate long documents; each window
is independently screened within the model's context length.
The PoC has directional (but buggy) Rust code for creating rolling windows
that can be referenced when designing this feature. This connects to OQ-05
because streaming/chunking affects how the firewall composes with other
guardrail systems in a pipeline.
Leave open for Phase 1 design, but the rolling window approach is the leading
candidate for Phase 2.
---
### ~~OQ-04: Should detection thresholds be per-model or globally configurable?~~
- **Origin**: [configuration.md](configuration.md), [codebook.md](codebook.md)
- **Status**: **resolved**
- **Priority**: medium
- **Resolution**: Both — thresholds are **model-specific by default** (shipped
with the codebook) but **globally overridable by the user**. Once calibrated,
models produce remarkably similar behavioral patterns across models (inspired
by the "platonic representation hypothesis" — different models converge on
similar internal representations of the same data). The individual activation
spaces differ, but the behavioral patterns they encode are consistent enough
that thresholds transfer reasonably well. The codebook ships recommended
thresholds calibrated for its model; users can adjust.
- **Cross-references**: ADR-003, ADR-004
---
## Theme: Integration
### OQ-05: How should the firewall integrate with existing guardrail systems?
- **Origin**: [firewall.md](firewall.md), [overview.md](overview.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (pending — needs deep dive into current guardrail landscape)
- **Cross-references**: ADR-002
The behavioral firewall is complementary to text-surface defenses. Users may
want to run both Llama Guard (text classification) and alknet-firewall
(behavioral signals) in series. However, what we're doing is fundamentally
different — it requires having the model and having trained on its specific
behavioral signals. This means direct API-level integration with other systems
may not be straightforward.
A deep dive into the current state of guardrail integration patterns
(LlamaFirewall's scanner interface, NeMo Guardrails' Colang DSL, etc.) is
needed to determine whether we should build adapters, define a common
interface, or simply provide a clean standalone API and let users compose
systems themselves.
Leave open — will research soon.
---
## Theme: Project Setup
### OQ-06: Should file-based configuration use TOML or YAML?
- **Origin**: [configuration.md](configuration.md)
- **Status**: open
- **Priority**: low
- **Resolution**: (pending — Phase 2 concern)
- **Cross-references**: None
Phase 1 uses constructor-based configuration only. A future phase may add
file-based configuration for easier deployment. TOML is consistent with
Python packaging (pyproject.toml) and increasingly the standard for Python
config. YAML is more familiar in ops/ML contexts. Either works.