alknet-firewall/docs/architecture/open-questions.md

# Open Questions

Centralized tracker for unresolved questions across all architecture documents.

## Theme: Inference Backend

### OQ-01: Should ONNX Runtime be a supported inference backend in Phase 1?

- **Origin**: [model.md](model.md), [overview.md](overview.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (pending)
- **Cross-references**: ADR-006

ONNX Runtime provides a much smaller install footprint (~30-50MB vs 200MB-2.5GB
for PyTorch) and is well-suited for inference-only use. HuggingFace's `optimum`
library provides drop-in replacement classes. However, supporting it in Phase 1
adds complexity: model must be exported to ONNX format, `optimum` integration
must be tested, and the activation extraction API may differ from PyTorch.

Consider: Is the smaller footprint worth the integration complexity in Phase 1,
or should ONNX support wait until Phase 2 when the core API is stable?

---

## Theme: Codebook Design

### OQ-02: What is the minimum viable codebook — can the 1,245-line PoC codebook be compressed?

- **Origin**: [codebook.md](codebook.md)
- **Status**: open
- **Priority**: high
- **Resolution**: (pending)
- **Cross-references**: ADR-004

The PoC codebook is 1,245 lines — much of it may be boilerplate, dead code,
or excessive parameterization from the research phase. Understanding what's
essential vs. exploratory is critical for the initial extraction. The codebook
training pipeline (`run_manifold_projection.py`) should also be analyzed.

Consider: How many SVD dimensions are actually needed? What's the minimum
calibration dataset? Can spline distributions be simplified?

---

## Theme: API Design

### OQ-03: Should the firewall support streaming/chunked input screening?

- **Origin**: [firewall.md](firewall.md)
- **Status**: open
- **Priority**: low
- **Resolution**: (pending)
- **Cross-references**: ADR-003

Some inputs arrive in chunks (streaming API responses, large documents). Should
the firewall support incremental screening as chunks arrive, or require the
full input before screening? Incremental screening could detect attacks earlier
but requires buffering and state management.

This is low priority for Phase 1 but affects the internal API design.

---

### OQ-04: Should detection thresholds be per-model or globally configurable?

- **Origin**: [configuration.md](configuration.md), [codebook.md](codebook.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (pending)
- **Cross-references**: ADR-003, ADR-004

Different detector models may produce different score distributions. Thresholds
that work for SmolLM2-135M may not work for a different model. Should
thresholds be tied to the codebook (per-model) or set globally by the user?

Consider: Per-model defaults with user overrides? Codebook ships with
recommended thresholds that the user can adjust?

---

## Theme: Integration

### OQ-05: How should the firewall integrate with existing guardrail systems?

- **Origin**: [firewall.md](firewall.md), [overview.md](overview.md)
- **Status**: open
- **Priority**: medium
- **Resolution**: (pending)
- **Cross-references**: ADR-002

The behavioral firewall is complementary to text-surface defenses. Users may
want to run both Llama Guard (text classification) and alknet-firewall
(behavioral signals) in series. How should these be composed?

Consider: Integration adapters? A common interface? Callback hooks? Or is
composition the user's responsibility and we just provide a clean standalone API?

---

## Theme: Project Setup

### OQ-06: Should file-based configuration use TOML or YAML?

- **Origin**: [configuration.md](configuration.md)
- **Status**: open
- **Priority**: low
- **Resolution**: (pending)
- **Cross-references**: None

Phase 1 uses constructor-based configuration only. A future phase may add
file-based configuration for easier deployment. TOML is consistent with
Python packaging (pyproject.toml) and increasingly the standard for Python
config. YAML is more familiar in ops/ML contexts. Either works.

---

### OQ-07: Is a Rust port feasible given current ML framework maturity?

- **Origin**: [overview.md](overview.md), ADR-001
- **Status**: open
- **Priority**: low
- **Resolution**: (pending)
- **Cross-references**: ADR-001

A Rust port using burn/cubecl was attempted during the PoC phase and failed.
The ML framework ecosystem in Rust is not yet mature enough for this type
of work. This remains a speculative Phase 3 goal. Revisit when burn/cubecl
matures or alternative Rust ML frameworks emerge.