- OQ-04 resolved: thresholds are both model-specific (shipped with codebook) and user-overridable. Inspired by platonic representation hypothesis — calibrated models converge on similar behavioral patterns. - OQ-07 removed: Rust port is an alknet project concern, not relevant to the Python package architecture. Removed from overview.md Phase 3. - OQ-03 enriched: rolling window token screening for granular detection in documents (PDF→markdown use case, academic paper injection detection). Upgraded from low to medium priority. - OQ-01 updated: likely path is PyTorch first, ONNX export by default. - OQ-05 updated: needs deep dive into guardrail landscape. - Updated threshold description in configuration.md with platonic representation context.
144 lines
5.9 KiB
Markdown
144 lines
5.9 KiB
Markdown
# Open Questions
|
|
|
|
Centralized tracker for unresolved questions across all architecture documents.
|
|
|
|
## Theme: Inference Backend
|
|
|
|
### OQ-01: Should ONNX Runtime be a supported inference backend in Phase 1?
|
|
|
|
- **Origin**: [model.md](model.md), [overview.md](overview.md)
|
|
- **Status**: open
|
|
- **Priority**: medium
|
|
- **Resolution**: (pending — needs research into ONNX export path)
|
|
- **Cross-references**: ADR-006
|
|
|
|
ONNX Runtime provides a much smaller install footprint (~30-50MB vs 200MB-2.5GB
|
|
for PyTorch) and is well-suited for inference-only use. HuggingFace's `optimum`
|
|
library provides drop-in replacement classes. However, supporting it in Phase 1
|
|
adds complexity: model must be exported to ONNX format, `optimum` integration
|
|
must be tested, and the activation extraction API may differ from PyTorch.
|
|
|
|
The likely path is: build with PyTorch first, then export to ONNX by default.
|
|
This needs research to confirm the activation extraction API compatibility and
|
|
ONNX export quality for SmolLM2-135M. Leave open for now.
|
|
|
|
---
|
|
|
|
## Theme: Codebook Design
|
|
|
|
### OQ-02: What is the minimum viable codebook — can the 1,245-line PoC codebook be compressed?
|
|
|
|
- **Origin**: [codebook.md](codebook.md)
|
|
- **Status**: open
|
|
- **Priority**: high
|
|
- **Resolution**: (pending — dedicated research session needed)
|
|
- **Cross-references**: ADR-004
|
|
|
|
The PoC codebook is 1,245 lines — much of it may be boilerplate, dead code,
|
|
or excessive parameterization from the research phase. Understanding what's
|
|
essential vs. exploratory is critical for the initial extraction. The codebook
|
|
training pipeline (`run_manifold_projection.py`) should also be analyzed.
|
|
|
|
Consider: How many SVD dimensions are actually needed? What's the minimum
|
|
calibration dataset? Can spline distributions be simplified? This needs a
|
|
dedicated session to analyze the PoC codebase.
|
|
|
|
---
|
|
|
|
## Theme: API Design
|
|
|
|
### OQ-03: Should the firewall support streaming/chunked input screening?
|
|
|
|
- **Origin**: [firewall.md](firewall.md)
|
|
- **Status**: open
|
|
- **Priority**: medium
|
|
- **Cross-references**: ADR-003, OQ-05
|
|
|
|
Some inputs arrive in chunks (streaming API responses, large documents). Should
|
|
the firewall support incremental screening as chunks arrive, or require the
|
|
full input before screening? Incremental screening could detect attacks earlier
|
|
but requires buffering and state management.
|
|
|
|
**Rolling window approach**: One promising direction is rolling windows of
|
|
tokens — chunking large text into overlapping windows and screening each
|
|
window independently. This enables:
|
|
|
|
1. **Granular detection**: For the instruction firewall use case (screening
|
|
academic papers converted from PDF to markdown), rolling windows can
|
|
red-flag specific *sections* of a document rather than the whole thing.
|
|
This is directly useful for catching hidden prompt injections in academic
|
|
research papers (~20 real examples found of researchers slipping injections
|
|
past peer review).
|
|
2. **Parallel processing**: Windows can be screened in parallel, enabling
|
|
throughput scaling.
|
|
3. **Large input handling**: No need to truncate long documents; each window
|
|
is independently screened within the model's context length.
|
|
|
|
The PoC has directional (but buggy) Rust code for creating rolling windows
|
|
that can be referenced when designing this feature. This connects to OQ-05
|
|
because streaming/chunking affects how the firewall composes with other
|
|
guardrail systems in a pipeline.
|
|
|
|
Leave open for Phase 1 design, but the rolling window approach is the leading
|
|
candidate for Phase 2.
|
|
|
|
---
|
|
|
|
### ~~OQ-04: Should detection thresholds be per-model or globally configurable?~~
|
|
|
|
- **Origin**: [configuration.md](configuration.md), [codebook.md](codebook.md)
|
|
- **Status**: **resolved**
|
|
- **Priority**: medium
|
|
- **Resolution**: Both — thresholds are **model-specific by default** (shipped
|
|
with the codebook) but **globally overridable by the user**. Once calibrated,
|
|
models produce remarkably similar behavioral patterns across models (inspired
|
|
by the "platonic representation hypothesis" — different models converge on
|
|
similar internal representations of the same data). The individual activation
|
|
spaces differ, but the behavioral patterns they encode are consistent enough
|
|
that thresholds transfer reasonably well. The codebook ships recommended
|
|
thresholds calibrated for its model; users can adjust.
|
|
- **Cross-references**: ADR-003, ADR-004
|
|
|
|
---
|
|
|
|
## Theme: Integration
|
|
|
|
### OQ-05: How should the firewall integrate with existing guardrail systems?
|
|
|
|
- **Origin**: [firewall.md](firewall.md), [overview.md](overview.md)
|
|
- **Status**: open
|
|
- **Priority**: medium
|
|
- **Resolution**: (pending — needs deep dive into current guardrail landscape)
|
|
- **Cross-references**: ADR-002
|
|
|
|
The behavioral firewall is complementary to text-surface defenses. Users may
|
|
want to run both Llama Guard (text classification) and alknet-firewall
|
|
(behavioral signals) in series. However, what we're doing is fundamentally
|
|
different — it requires having the model and having trained on its specific
|
|
behavioral signals. This means direct API-level integration with other systems
|
|
may not be straightforward.
|
|
|
|
A deep dive into the current state of guardrail integration patterns
|
|
(LlamaFirewall's scanner interface, NeMo Guardrails' Colang DSL, etc.) is
|
|
needed to determine whether we should build adapters, define a common
|
|
interface, or simply provide a clean standalone API and let users compose
|
|
systems themselves.
|
|
|
|
Leave open — will research soon.
|
|
|
|
---
|
|
|
|
## Theme: Project Setup
|
|
|
|
### OQ-06: Should file-based configuration use TOML or YAML?
|
|
|
|
- **Origin**: [configuration.md](configuration.md)
|
|
- **Status**: open
|
|
- **Priority**: low
|
|
- **Resolution**: (pending — Phase 2 concern)
|
|
- **Cross-references**: None
|
|
|
|
Phase 1 uses constructor-based configuration only. A future phase may add
|
|
file-based configuration for easier deployment. TOML is consistent with
|
|
Python packaging (pyproject.toml) and increasingly the standard for Python
|
|
config. YAML is more familiar in ops/ML contexts. Either works. |