Phase 0→1 setup for alknet-firewall — a behavioral signal detection library that screens untrusted LLM inputs using small model activations. Architecture docs (5 specs, 10 ADRs, 7 open questions): - overview: vision, scope, dependencies, package structure - firewall: core API, alarm protocol, score composition, error handling - codebook: SVD basis, spline distributions, calibration, tensor format - model: activation extraction, model-agnostic interface, lazy loading - configuration: thresholds, model selection, detection tuning Research reports: - modern-python-project-setup: uv, pyproject.toml, src layout, ruff, CI - python-ml-packaging: optional PyTorch, HF Hub download, safetensors - llm-input-safety-landscape: threat taxonomy, defenses, academic evidence Agent role adaptations for Python project (replaced Rust conventions).
129 lines
4.4 KiB
Markdown
129 lines
4.4 KiB
Markdown
# Open Questions
|
|
|
|
Centralized tracker for unresolved questions across all architecture documents.
|
|
|
|
## Theme: Inference Backend
|
|
|
|
### OQ-01: Should ONNX Runtime be a supported inference backend in Phase 1?
|
|
|
|
- **Origin**: [model.md](model.md), [overview.md](overview.md)
|
|
- **Status**: open
|
|
- **Priority**: medium
|
|
- **Resolution**: (pending)
|
|
- **Cross-references**: ADR-006
|
|
|
|
ONNX Runtime provides a much smaller install footprint (~30-50MB vs 200MB-2.5GB
|
|
for PyTorch) and is well-suited for inference-only use. HuggingFace's `optimum`
|
|
library provides drop-in replacement classes. However, supporting it in Phase 1
|
|
adds complexity: model must be exported to ONNX format, `optimum` integration
|
|
must be tested, and the activation extraction API may differ from PyTorch.
|
|
|
|
Consider: Is the smaller footprint worth the integration complexity in Phase 1,
|
|
or should ONNX support wait until Phase 2 when the core API is stable?
|
|
|
|
---
|
|
|
|
## Theme: Codebook Design
|
|
|
|
### OQ-02: What is the minimum viable codebook — can the 1,245-line PoC codebook be compressed?
|
|
|
|
- **Origin**: [codebook.md](codebook.md)
|
|
- **Status**: open
|
|
- **Priority**: high
|
|
- **Resolution**: (pending)
|
|
- **Cross-references**: ADR-004
|
|
|
|
The PoC codebook is 1,245 lines — much of it may be boilerplate, dead code,
|
|
or excessive parameterization from the research phase. Understanding what's
|
|
essential vs. exploratory is critical for the initial extraction. The codebook
|
|
training pipeline (`run_manifold_projection.py`) should also be analyzed.
|
|
|
|
Consider: How many SVD dimensions are actually needed? What's the minimum
|
|
calibration dataset? Can spline distributions be simplified?
|
|
|
|
---
|
|
|
|
## Theme: API Design
|
|
|
|
### OQ-03: Should the firewall support streaming/chunked input screening?
|
|
|
|
- **Origin**: [firewall.md](firewall.md)
|
|
- **Status**: open
|
|
- **Priority**: low
|
|
- **Resolution**: (pending)
|
|
- **Cross-references**: ADR-003
|
|
|
|
Some inputs arrive in chunks (streaming API responses, large documents). Should
|
|
the firewall support incremental screening as chunks arrive, or require the
|
|
full input before screening? Incremental screening could detect attacks earlier
|
|
but requires buffering and state management.
|
|
|
|
This is low priority for Phase 1 but affects the internal API design.
|
|
|
|
---
|
|
|
|
### OQ-04: Should detection thresholds be per-model or globally configurable?
|
|
|
|
- **Origin**: [configuration.md](configuration.md), [codebook.md](codebook.md)
|
|
- **Status**: open
|
|
- **Priority**: medium
|
|
- **Resolution**: (pending)
|
|
- **Cross-references**: ADR-003, ADR-004
|
|
|
|
Different detector models may produce different score distributions. Thresholds
|
|
that work for SmolLM2-135M may not work for a different model. Should
|
|
thresholds be tied to the codebook (per-model) or set globally by the user?
|
|
|
|
Consider: Per-model defaults with user overrides? Codebook ships with
|
|
recommended thresholds that the user can adjust?
|
|
|
|
---
|
|
|
|
## Theme: Integration
|
|
|
|
### OQ-05: How should the firewall integrate with existing guardrail systems?
|
|
|
|
- **Origin**: [firewall.md](firewall.md), [overview.md](overview.md)
|
|
- **Status**: open
|
|
- **Priority**: medium
|
|
- **Resolution**: (pending)
|
|
- **Cross-references**: ADR-002
|
|
|
|
The behavioral firewall is complementary to text-surface defenses. Users may
|
|
want to run both Llama Guard (text classification) and alknet-firewall
|
|
(behavioral signals) in series. How should these be composed?
|
|
|
|
Consider: Integration adapters? A common interface? Callback hooks? Or is
|
|
composition the user's responsibility and we just provide a clean standalone API?
|
|
|
|
---
|
|
|
|
## Theme: Project Setup
|
|
|
|
### OQ-06: Should file-based configuration use TOML or YAML?
|
|
|
|
- **Origin**: [configuration.md](configuration.md)
|
|
- **Status**: open
|
|
- **Priority**: low
|
|
- **Resolution**: (pending)
|
|
- **Cross-references**: None
|
|
|
|
Phase 1 uses constructor-based configuration only. A future phase may add
|
|
file-based configuration for easier deployment. TOML is consistent with
|
|
Python packaging (pyproject.toml) and increasingly the standard for Python
|
|
config. YAML is more familiar in ops/ML contexts. Either works.
|
|
|
|
---
|
|
|
|
### OQ-07: Is a Rust port feasible given current ML framework maturity?
|
|
|
|
- **Origin**: [overview.md](overview.md), ADR-001
|
|
- **Status**: open
|
|
- **Priority**: low
|
|
- **Resolution**: (pending)
|
|
- **Cross-references**: ADR-001
|
|
|
|
A Rust port using burn/cubecl was attempted during the PoC phase and failed.
|
|
The ML framework ecosystem in Rust is not yet mature enough for this type
|
|
of work. This remains a speculative Phase 3 goal. Revisit when burn/cubecl
|
|
matures or alternative Rust ML frameworks emerge. |