Research-driven resolution of OQ-01, OQ-02, OQ-05, OQ-06: - OQ-01: Remove ONNX Runtime from scope entirely — doesn't support activation extraction natively (optimum #972 closed as not planned), bloated model exports; burn/cublas via safetensors is a better future path - OQ-02: Codebook compresses ~65% (1,245 → 500-600 lines); add Package Structure and Extraction from PoC sections to codebook.md based on PoC analysis of metaspline firewall_codebook.py - OQ-05: Standalone API + thin adapter pattern (ADR-011); Phase 1 ships Firewall.screen() only, Phase 2 adds <100-line adapter packages for LlamaFirewall, OpenAI Agents SDK, NeMo Guardrails - OQ-06: TOML for file-based config — standard modern Python, two-way door Also: research OQ-03 rolling windows from taskgraph-semantic reference code, remove onnxruntime/optimum from dependencies, move streaming screening to Phase 2, add burn/cublas as Phase 3 alternative backend.
127 lines
5.5 KiB
Markdown
127 lines
5.5 KiB
Markdown
# Open Questions
|
||
|
||
Centralized tracker for unresolved questions across all architecture documents.
|
||
|
||
## Theme: Inference Backend
|
||
|
||
### ~~OQ-01: Should ONNX Runtime be a supported inference backend in Phase 1?~~
|
||
|
||
- **Origin**: [model.md](model.md), [overview.md](overview.md)
|
||
- **Status**: **resolved**
|
||
- **Priority**: medium
|
||
- **Resolution**: Removed from scope entirely. ONNX Runtime does not support
|
||
`output_hidden_states=True` natively (HuggingFace optimum issue #972 was
|
||
closed as "not planned"), making activation extraction — the core operation —
|
||
impractical without a custom ONNX graph modification pipeline. The ONNX
|
||
model format also produces bloated exports. A future alternative inference
|
||
path using burn/cublas with safetensors is more promising since it supports
|
||
all platforms and uses the same model format we already require.
|
||
- **Cross-references**: ADR-006
|
||
|
||
---
|
||
|
||
## Theme: Codebook Design
|
||
|
||
### ~~OQ-02: What is the minimum viable codebook — can the 1,245-line PoC codebook be compressed?~~
|
||
|
||
- **Origin**: [codebook.md](codebook.md)
|
||
- **Status**: **resolved**
|
||
- **Priority**: high
|
||
- **Resolution**: Yes — ~65% compression to 500–600 lines total (400–500 runtime
|
||
+ 150–200 training). The PoC contains ~480 lines of essential runtime code
|
||
plus ~178 lines needed from metaspline core. The 5x-repeated decomposition
|
||
pipeline collapses into a single `decompose()` function (~50 lines saved).
|
||
The histogram classifier (~130 lines) is exploratory and not MVP. The
|
||
`build()` method (429 lines) is decomposed: training logic moves to
|
||
`training/compiler.py`, runtime state becomes immutable serialized data.
|
||
See [poc-architecture.md](../research/codebook-analysis/poc-architecture.md)
|
||
and the Package Structure section in [codebook.md](codebook.md).
|
||
- **Cross-references**: ADR-004
|
||
|
||
---
|
||
|
||
## Theme: API Design
|
||
|
||
### OQ-03: Should the firewall support streaming/chunked input screening?
|
||
|
||
- **Origin**: [firewall.md](firewall.md)
|
||
- **Status**: open
|
||
- **Priority**: medium
|
||
- **Cross-references**: ADR-003, OQ-05
|
||
|
||
Some inputs arrive in chunks (streaming API responses, large documents). Should
|
||
the firewall support incremental screening as chunks arrive, or require the
|
||
full input before screening? Incremental screening could detect attacks earlier
|
||
but requires buffering and state management.
|
||
|
||
**Rolling window approach**: One promising direction is rolling windows of
|
||
tokens — chunking large text into overlapping windows and screening each
|
||
window independently. This enables:
|
||
|
||
1. **Granular detection**: For the instruction firewall use case (screening
|
||
academic papers converted from PDF to markdown), rolling windows can
|
||
red-flag specific *sections* of a document rather than the whole thing.
|
||
This is directly useful for catching hidden prompt injections in academic
|
||
research papers (~20 real examples found of researchers slipping injections
|
||
past peer review).
|
||
2. **Parallel processing**: Windows can be screened in parallel, enabling
|
||
throughput scaling.
|
||
3. **Large input handling**: No need to truncate long documents; each window
|
||
is independently screened within the model's context length.
|
||
|
||
The PoC has directional (but buggy) Rust code for creating rolling windows
|
||
that can be referenced when designing this feature. This connects to OQ-05
|
||
because streaming/chunking affects how the firewall composes with other
|
||
guardrail systems in a pipeline.
|
||
|
||
Leave open for Phase 1 design, but the rolling window approach is the leading
|
||
candidate for Phase 2.
|
||
|
||
---
|
||
|
||
### ~~OQ-04: Should detection thresholds be per-model or globally configurable?~~
|
||
|
||
- **Origin**: [configuration.md](configuration.md), [codebook.md](codebook.md)
|
||
- **Status**: **resolved**
|
||
- **Priority**: medium
|
||
- **Resolution**: Both — thresholds are **model-specific by default** (shipped
|
||
with the codebook) but **globally overridable by the user**. Once calibrated,
|
||
models produce remarkably similar behavioral patterns across models (inspired
|
||
by the "platonic representation hypothesis" — different models converge on
|
||
similar internal representations of the same data). The individual activation
|
||
spaces differ, but the behavioral patterns they encode are consistent enough
|
||
that thresholds transfer reasonably well. The codebook ships recommended
|
||
thresholds calibrated for its model; users can adjust.
|
||
- **Cross-references**: ADR-003, ADR-004
|
||
|
||
---
|
||
|
||
## Theme: Integration
|
||
|
||
### ~~OQ-05: How should the firewall integrate with existing guardrail systems?~~
|
||
|
||
- **Origin**: [firewall.md](firewall.md), [overview.md](overview.md)
|
||
- **Status**: **resolved**
|
||
- **Priority**: medium
|
||
- **Resolution**: Standalone API + thin adapter pattern (ADR-011). Phase 1:
|
||
ship the standalone `Firewall.screen(text) → Alarm` API only. Phase 2:
|
||
build thin adapter packages (<100 lines each) for LlamaFirewall,
|
||
OpenAI Agents SDK, and NeMo Guardrails as optional dependencies. Do NOT
|
||
build a common `ScreeningProvider` interface — behavioral detection is
|
||
fundamentally different from text-surface defenses and premature abstraction
|
||
would be constraining.
|
||
- **Cross-references**: ADR-002, ADR-011
|
||
|
||
---
|
||
|
||
## Theme: Project Setup
|
||
|
||
### ~~OQ-06: Should file-based configuration use TOML or YAML?~~
|
||
|
||
- **Origin**: [configuration.md](configuration.md)
|
||
- **Status**: **resolved**
|
||
- **Priority**: low
|
||
- **Resolution**: TOML. Consistent with modern Python packaging conventions
|
||
(`pyproject.toml`) and increasingly the standard for Python configuration.
|
||
This is a two-way door decision — reverting to YAML later is straightforward.
|
||
- **Cross-references**: None |