Research-driven resolution of OQ-01, OQ-02, OQ-05, OQ-06: - OQ-01: Remove ONNX Runtime from scope entirely — doesn't support activation extraction natively (optimum #972 closed as not planned), bloated model exports; burn/cublas via safetensors is a better future path - OQ-02: Codebook compresses ~65% (1,245 → 500-600 lines); add Package Structure and Extraction from PoC sections to codebook.md based on PoC analysis of metaspline firewall_codebook.py - OQ-05: Standalone API + thin adapter pattern (ADR-011); Phase 1 ships Firewall.screen() only, Phase 2 adds <100-line adapter packages for LlamaFirewall, OpenAI Agents SDK, NeMo Guardrails - OQ-06: TOML for file-based config — standard modern Python, two-way door Also: research OQ-03 rolling windows from taskgraph-semantic reference code, remove onnxruntime/optimum from dependencies, move streaming screening to Phase 2, add burn/cublas as Phase 3 alternative backend.
5.5 KiB
Open Questions
Centralized tracker for unresolved questions across all architecture documents.
Theme: Inference Backend
OQ-01: Should ONNX Runtime be a supported inference backend in Phase 1?
- Origin: model.md, overview.md
- Status: resolved
- Priority: medium
- Resolution: Removed from scope entirely. ONNX Runtime does not support
output_hidden_states=Truenatively (HuggingFace optimum issue #972 was closed as "not planned"), making activation extraction — the core operation — impractical without a custom ONNX graph modification pipeline. The ONNX model format also produces bloated exports. A future alternative inference path using burn/cublas with safetensors is more promising since it supports all platforms and uses the same model format we already require. - Cross-references: ADR-006
Theme: Codebook Design
OQ-02: What is the minimum viable codebook — can the 1,245-line PoC codebook be compressed?
- Origin: codebook.md
- Status: resolved
- Priority: high
- Resolution: Yes — ~65% compression to 500–600 lines total (400–500 runtime
- 150–200 training). The PoC contains ~480 lines of essential runtime code
plus ~178 lines needed from metaspline core. The 5x-repeated decomposition
pipeline collapses into a single
decompose()function (~50 lines saved). The histogram classifier (~130 lines) is exploratory and not MVP. Thebuild()method (429 lines) is decomposed: training logic moves totraining/compiler.py, runtime state becomes immutable serialized data. See poc-architecture.md and the Package Structure section in codebook.md.
- 150–200 training). The PoC contains ~480 lines of essential runtime code
plus ~178 lines needed from metaspline core. The 5x-repeated decomposition
pipeline collapses into a single
- Cross-references: ADR-004
Theme: API Design
OQ-03: Should the firewall support streaming/chunked input screening?
- Origin: firewall.md
- Status: open
- Priority: medium
- Cross-references: ADR-003, OQ-05
Some inputs arrive in chunks (streaming API responses, large documents). Should the firewall support incremental screening as chunks arrive, or require the full input before screening? Incremental screening could detect attacks earlier but requires buffering and state management.
Rolling window approach: One promising direction is rolling windows of tokens — chunking large text into overlapping windows and screening each window independently. This enables:
- Granular detection: For the instruction firewall use case (screening academic papers converted from PDF to markdown), rolling windows can red-flag specific sections of a document rather than the whole thing. This is directly useful for catching hidden prompt injections in academic research papers (~20 real examples found of researchers slipping injections past peer review).
- Parallel processing: Windows can be screened in parallel, enabling throughput scaling.
- Large input handling: No need to truncate long documents; each window is independently screened within the model's context length.
The PoC has directional (but buggy) Rust code for creating rolling windows that can be referenced when designing this feature. This connects to OQ-05 because streaming/chunking affects how the firewall composes with other guardrail systems in a pipeline.
Leave open for Phase 1 design, but the rolling window approach is the leading candidate for Phase 2.
OQ-04: Should detection thresholds be per-model or globally configurable?
- Origin: configuration.md, codebook.md
- Status: resolved
- Priority: medium
- Resolution: Both — thresholds are model-specific by default (shipped with the codebook) but globally overridable by the user. Once calibrated, models produce remarkably similar behavioral patterns across models (inspired by the "platonic representation hypothesis" — different models converge on similar internal representations of the same data). The individual activation spaces differ, but the behavioral patterns they encode are consistent enough that thresholds transfer reasonably well. The codebook ships recommended thresholds calibrated for its model; users can adjust.
- Cross-references: ADR-003, ADR-004
Theme: Integration
OQ-05: How should the firewall integrate with existing guardrail systems?
- Origin: firewall.md, overview.md
- Status: resolved
- Priority: medium
- Resolution: Standalone API + thin adapter pattern (ADR-011). Phase 1:
ship the standalone
Firewall.screen(text) → AlarmAPI only. Phase 2: build thin adapter packages (<100 lines each) for LlamaFirewall, OpenAI Agents SDK, and NeMo Guardrails as optional dependencies. Do NOT build a commonScreeningProviderinterface — behavioral detection is fundamentally different from text-surface defenses and premature abstraction would be constraining. - Cross-references: ADR-002, ADR-011
Theme: Project Setup
OQ-06: Should file-based configuration use TOML or YAML?
- Origin: configuration.md
- Status: resolved
- Priority: low
- Resolution: TOML. Consistent with modern Python packaging conventions
(
pyproject.toml) and increasingly the standard for Python configuration. This is a two-way door decision — reverting to YAML later is straightforward. - Cross-references: None