alknet/docs/architecture/open-questions.md

---
status: draft
last_updated: 2026-06-26
---

# Open Questions

Questions are organized by theme. Each question has a stable OQ-ID for cross-referencing from spec documents.

Door type classifications follow ADR-009:
- **One-way door**: Reversal requires rewriting significant code or permanently closes a capability. Requires ADR before implementation.
- **Two-way door**: Reversal is cheap or additive. Can be decided during implementation.

## Theme: Core Types

### OQ-01: BiStream Type Definition

- **Origin**: [overview.md](overview.md)
- **Status**: resolved
- **Door type**: One-way
- **Priority**: high
- **Resolution**: BiStream is a trait (`AsyncRead + AsyncWrite + Send + Unpin`). Handlers receive a `Connection` (not a single BiStream). This preserves the WASM door — browser clients can implement BiStream over WebTransport streams. See ADR-007.
- **Cross-references**: ADR-002, ADR-007, ADR-009

### OQ-02: AuthContext Resolution Timing

- **Origin**: [overview.md](overview.md)
- **Status**: resolved
- **Door type**: One-way
- **Priority**: high
- **Resolution**: Hybrid model (Option C) — endpoint resolves what it can (e.g., TLS client certificate), handler resolves what it must (e.g., AuthToken in first frame). AuthContext may be partial when `handle()` is called. See ADR-004.
- **Cross-references**: ADR-002, ADR-004

## Theme: ALPN and Routing

### OQ-03: ALPN String Naming Convention

- **Origin**: [overview.md](overview.md)
- **Status**: resolved
- **Door type**: One-way
- **Priority**: medium
- **Resolution**: Custom ALPNs use `alknet/<name>` prefix (no version), standard ALPNs use IANA strings. No version negotiation initially. See ADR-006.
- **Cross-references**: ADR-001, ADR-006

### OQ-04: Dynamic Handler Registration at Runtime vs Static at Startup

- **Origin**: [overview.md](overview.md)
- **Status**: resolved
- **Door type**: Two-way
- **Priority**: low
- **Resolution**: Static registration at startup. `HandlerRegistry` is immutable after construction. ALPN strings in the TLS `ServerConfig` are derived from the registry at startup — adding a handler at runtime requires rebuilding the TLS config. The `ArcSwap<HandlerRegistry>` pattern can be applied later if needed (two-way door). See ADR-010.

  **Scope clarification (ADR-024)**: This resolution applies to the
  **`HandlerRegistry`** (ALPN string → `ProtocolHandler`), which is what
  ADR-010 governs. The call protocol's **`OperationRegistry`** (operation
  name → `HandlerRegistration`) is a *separate* registry living inside the
  `CallAdapter`, behind the single ALPN `alknet/call`. Its mutability
  profile is governed by ADR-024, not by this OQ. ADR-024 layers the
  operation registry by trust boundary: curated `Local` ops are immutable
  (same rationale as here — composing ops are privileged, the startup trust
  boundary is where their authority is granted); `Session` and imported
  (`FromCall` etc.) ops are dynamic at their respective trust-boundary
  scopes (session, connection). The pre-ADR-024 blanket immutability claim
  in `operation-registry.md` was inherited by analogy from this OQ and did
  not actually apply — the TLS-config argument that justifies
  `HandlerRegistry` immutability does not touch the `OperationRegistry`.
- **Cross-references**: ADR-001, ADR-010, ADR-024, [endpoint.md](crates/core/endpoint.md), [operation-registry.md](crates/call/operation-registry.md)

## Theme: Transport and Endpoint

### OQ-05: Multi-Connectivity Endpoint

- **Origin**: [overview.md](overview.md)
- **Status**: resolved
- **Door type**: One-way
- **Priority**: high
- **Resolution**: `AlknetEndpoint` supports both `quinn::Endpoint` (public QUIC+TLS) and `iroh::Endpoint` (P2P relay-assisted) simultaneously, both optional and feature-gated. Both produce QUIC connections that dispatch through the same `HandlerRegistry` by ALPN string. These are not interchangeable transports — they serve fundamentally different deployment contexts (public IP vs NAT traversal). TCP is not an endpoint concern — bare TCP SSH is handled by the SSH handler directly. See ADR-010.
- **Cross-references**: ADR-001, ADR-010, [endpoint.md](crates/core/endpoint.md)

### OQ-06: Server-Side ALPN vs Client-Side ALPN

- **Origin**: ADR-001
- **Status**: resolved
- **Door type**: One-way
- **Priority**: low
- **Resolution**: One ALPN per connection. Clients open one QUIC connection per ALPN. QUIC connections are cheap (multiplexed over the same UDP flow). See ADR-006.
- **Cross-references**: ADR-001, ADR-006

## Theme: Call Protocol

### OQ-07: Call Protocol Scope Within a Connection

- **Origin**: ADR-005
- **Status**: resolved
- **Door type**: Two-way
- **Priority**: medium
- **Resolution**: The call protocol uses bidirectional QUIC streams with EventEnvelope framing and ID-based correlation via PendingRequestMap. The protocol is stream-agnostic — the client can open one stream per operation, multiplex on one stream, or any mix. Correlation is by request ID, not by stream. Both sides can initiate calls. One `alknet/call` connection gives access to the full operation registry (call, subscribe, batch, schema). No multiplexing layer is needed inside the connection. See ADR-012.
- **Cross-references**: ADR-005, ADR-012

## Theme: Security

### OQ-08: Vault Integration Point

- **Origin**: [overview.md](overview.md)
- **Status**: resolved
- **Door type**: One-way
- **Priority**: medium
- **Resolution**: CLI-embedded, assembly-layer only. The CLI binary instantiates `VaultServiceHandle` locally at startup, derives and decrypts the credentials each handler needs, and injects them into handler capabilities. alknet-vault has no ALPN, no alknet-core dependency, and no operations registered in the call protocol. The master seed and derived private keys never cross the network. The vault is a capability source, not a network service. See ADR-008 and ADR-014.
- **Cross-references**: ADR-003, ADR-005, ADR-008, ADR-014

## Deferred Questions

These questions are acknowledged but not active. They will be promoted to open when their crate is being specified.

### OQ-09: WASM Target Boundaries

- **Origin**: [overview.md](overview.md)
- **Status**: deferred
- **Door type**: One-way (when applicable)
- **Priority**: low
- **Resolution**: Not an active question — WASM compatibility is a design constraint (see ADR-009, overview.md design principles), not a deliverable. Specific WASM targeting decisions will be made when individual crates are implemented. **BiStream being a trait preserves the *client-side* stream door** — a browser can implement BiStream over WebTransport streams. **The *server-side* dispatch door is NOT preserved by ADR-007 and is a known, accepted closure**: `Connection` is a concrete quinn-bound struct (not a trait), the accept loop uses `tokio::spawn` (tokio does not run on WASM), and the call-protocol dispatch internals (`PendingRequestMap`, `CallAdapter`) use tokio `oneshot`/`mpsc` channels. A WASM server-side peer would require a `Connection` trait and a runtime-abstracted accept loop — not planned. The browser path is client-side via a JS SDK, not server-side Rust-to-WASM. This is an explicit one-way door, not an oversight.
- **Cross-references**: ADR-007, ADR-009

### OQ-10: Git Adapter Scope — Smart Protocol Only or Full Server?

- **Origin**: [overview.md](overview.md)
- **Status**: deferred
- **Door type**: Two-way
- **Priority**: low
- **Resolution**: Deferred per the cleanup plan. Start with git smart protocol over QUIC streams. ERC721 integration and full server capabilities are additive. **Composability fork (review #002 W18)**: whether git operations are registered in the `OperationRegistry` and callable via `env.invoke()`, or only available as raw smart protocol on `alknet/git`, is a separate decision from ERC721 scope. The path of least resistance (raw smart protocol only) forecloses agent composition of git operations — an agent handler that wants to compose `git/clone` cannot, because there's no `OperationSpec`, no `Handler`, no registration. To make git composable, a call-protocol projection (a set of `HandlerRegistration` bundles wrapping git operations behind the registry) must be built alongside or instead of the raw handler. Resolve this when speccing alknet-git, not deferred past it.
- **Cross-references**: ADR-001

## Theme: alknet-core

### OQ-11: Handler-Level Auth Resolution Observability

- **Origin**: [auth.md](crates/core/auth.md)
- **Status**: resolved
- **Door type**: Two-way
- **Priority**: medium
- **Resolution**: **Option B — handlers store resolved identity on the Connection.** When a handler resolves identity inside `handle()` (the handler-level auth phase), it calls `connection.set_identity(identity)` to store the resolved `Identity` on the connection object. The endpoint and observability layers can read it later for connection logging, audit trails, and metrics.

  Why not Option A (return identity from `handle()`): it changes the `ProtocolHandler` trait signature for all handlers, even those that don't do auth resolution (DNS, health check). It also assumes one identity per connection — but the call protocol can have different identities per request on the same connection (one connection, multiple `call.requested` events with different auth tokens). Returning a single identity from `handle()` would be misleading for the call protocol.

  Why not Option C (identity stays local): the resolved identity is useful beyond the handler. The endpoint may want to log "connection from X authenticated as Y." A connection-level observability layer needs the identity. If it stays local, every handler that resolves identity would need to duplicate logging logic, and the endpoint can't correlate connections to identities.

  **Two identity scopes exist and must not be conflated:**
  - **Connection-level identity** (this decision): set once by the handler in `handle()`, stored on `Connection`, read by the endpoint for logging/observability. This is the "connection owner" — who opened this QUIC connection.
  - **Per-request identity** (already in the call protocol spec): set per `call.requested` by the `CallAdapter`, stored on `OperationContext.identity`. This is the "call caller" — who is making this specific call, which may upgrade mid-session (different auth tokens on the same connection).

  Both exist. The connection-level identity is the stable "who is this connection from"; the per-request identity is the dynamic "who is this specific call from." The call protocol's per-request resolution (which may produce a different identity than the connection-level resolution) takes precedence for ACL on `OperationContext` — the connection-level identity is for observability only, not for ACL.

  **C13 resolution (review #002)**: the endpoint does **not** read
  `identity()` after `handle()` returns. The `Connection` is moved into the
  spawned handler task (endpoint.md), so the endpoint no longer has a
  reference to it. Connection-level observability (remote addr, ALPN,
  connection ID) is logged by the endpoint *before* the move. Identity-level
  observability is logged by the handler (the handler knows which identity
  it resolved and can log it). There is no `Arc<Connection>` sharing or
  channel-based identity-reporting mechanism — the simplest honest answer
  that avoids over-engineering the observability path before there's a
  demonstrated need. If a future use case requires the endpoint to
  correlate connections to identities, an `Arc<Connection>` or a
  side-channel can be added then.
- **Cross-references**: ADR-004, ADR-011, ADR-015 (per-request identity on OperationContext), [auth.md](crates/core/auth.md)

### OQ-12: TLS Identity Provisioning in AlknetEndpoint

- **Origin**: [endpoint.md](crates/core/endpoint.md), [config.md](crates/core/config.md)
- **Status**: resolved
- **Door type**: One-way
- **Priority**: high
- **Resolution**: TLS identity in alknet has two distinct use cases, not one:

  **Use case 1 — P2P / key-based identity (default for most alknet nodes):** RFC 7250 raw Ed25519 public keys. No domain, no CA, no cert renewal. The Ed25519 public key IS the node's identity. This is the same model iroh uses with its `NodeId`. It works natively with SSH auth (same key type) and git (SSH key-based auth). `TlsIdentity::RawKey(Ed25519SecretKey)` in `StaticConfig` covers this. As of [ADR-027](decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md), `RawKey` uses `ed25519_dalek::SigningKey` (via an alknet-core wrapper), **not** `iroh::SecretKey` — so raw-key TLS identity is available in quinn-only builds without the `iroh` feature.

  **Use case 2 — Domain-hosted services (relays, public-facing nodes):** X.509 certificates with domain names. Required for browser/WebTransport clients, which don't support RFC 7250. This has two sub-cases:
  - **Manual**: Provide cert/key file paths via `TlsIdentity::X509`. Already specified in `StaticConfig`.
  - **ACME auto-provisioning**: Let's Encrypt via `rustls-acme`. `TlsIdentity::Acme { domains, cache_dir, directory, contact }` carries static config; the endpoint constructs the `AcmeState` async state machine at setup time. Feature-gated behind `acme`. Designed in [ADR-027](decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md). The reverse-proxy project (`/workspace/@alkdev/reverse-proxy`) demonstrates the proven pattern: `AcmeConfig`, `ResolvesServerCertAcme`, TLS-ALPN-01 challenge handling, automatic renewal.

  **Browser constraint**: Browsers require X.509 and don't support RFC 7250. For browser/WebTransport clients, domain-hosted nodes with X.509 certs are mandatory. All other clients (SSH, git, alknet-native) work with raw keys by default.

  The `TlsIdentity` enum in `StaticConfig` captures all four modes (`X509`, `RawKey`, `SelfSigned`, `Acme`). [ADR-027](decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md) records the design decisions for ACME integration and RawKey decoupling.
- **Cross-references**: ADR-010, ADR-027, [config.md](crates/core/config.md), [endpoint.md](crates/core/endpoint.md)

### OQ-13: Operation Path Format and Routing Scope

- **Origin**: [operation-registry.md](crates/call/operation-registry.md)
- **Status**: resolved
- **Door type**: Two-way
- **Priority**: medium
- **Resolution**: alknet-call uses `/{service}/{op}` (e.g., `/fs/readFile`, `/agent/chat`, `/services/list`). This is the correct format for the alknet-call crate — it is not a "Phase 1 simplification" but the right design for this architecture. The `/{node}/{service}/{op}` pattern from the reference implementation served a head/worker routing model that is a separate architectural concern. Remote dispatch (federation / node-level routing) would be a different mechanism at a different layer, not a prefix added to alknet-call's operation paths. If remote dispatch is ever needed, it would be addressed by a separate crate or a routing layer above the operation registry, not by changing alknet-call's path format. Two-way door — the path format can be extended later if needed, but `/{service}/{op}` is the correct design now.
- **Cross-references**: ADR-005, ADR-012

### OQ-14: Batch Operation Semantics

- **Origin**: [call-protocol.md](crates/call/call-protocol.md)
- **Status**: resolved
- **Door type**: Two-way
- **Priority**: low
- **Resolution**: Batch is a client-side pattern — multiple `call.requested` events with correlated IDs, responses arrive independently. This is the correct protocol design, not a simplification to be "upgraded" later. QUIC's stream multiplexing already provides the concurrency and ordering guarantees that batch would need. Batch-specific event types (e.g., `batch.requested`, `batch.responded`) would add protocol complexity without clear benefit over sending multiple `call.requested` events. If a compelling use case for atomic batch semantics emerges, it can be added as a new event type without breaking existing clients. Two-way door.
- **Cross-references**: ADR-012

## Theme: alknet-call

### OQ-15: Call Protocol Client and Adapter Contract

- **Origin**: [call-protocol.md](crates/call/call-protocol.md), [operation-registry.md](crates/call/operation-registry.md), ADR-013
- **Status**: resolved
- **Door type**: One-way
- **Priority**: high
- **Resolution**: `CallClient` opens QUIC connections and shares the dispatch loop with `CallAdapter` — both sides can send and receive `call.requested` once connected. Connection direction (who opened the connection) is independent of call direction (who calls whom). `from_call` adapter discovers remote operations via `services/list` + `services/schema` and registers them with forwarding handlers — same pattern as `from_openapi` and `from_mcp`. `to_openapi` and `to_mcp` project local operations to external protocols. Adapter contract trait (`OperationAdapter`) produces `(OperationSpec, Handler)` pairs. Cross-node call tree: abort cascade (ADR-016) propagates across node boundaries through `from_call` handlers. Credentials for connections come from capabilities (ADR-014). Adapter-registered operations are `Internal` by default (ADR-015). See ADR-017.
- **Cross-references**: ADR-005, ADR-013, ADR-014, ADR-015, ADR-016, ADR-017, [call-protocol.md](crates/call/call-protocol.md), [operation-registry.md](crates/call/operation-registry.md)

### OQ-16: Safe Vault Operations for Call Protocol Exposure

- **Origin**: [operation-registry.md](crates/call/operation-registry.md), ADR-008
- **Status**: resolved
- **Door type**: One-way
- **Priority**: high
- **Resolution**: No vault operations are exposed over the call protocol for now. The vault is accessed only at the assembly layer (CLI binary at startup). Handlers receive secret material through `OperationContext.capabilities`, not by calling vault operations over the wire. The `operation-registry.md` spec previously showed `vault/derive`, `vault/unlock`, and `vault/decrypt` registered as call protocol operations — that was a contradiction with ADR-008's "capability source" model and has been corrected. If a future use case requires exposing a vault operation over the call protocol (e.g., a restricted `vault/public-key` operation that returns only public key material for identity verification), it would require its own ADR with an explicit threat model justification. See ADR-014.
- **Cross-references**: ADR-008, ADR-014, [operation-registry.md](crates/call/operation-registry.md)

### OQ-17: Abort Cascade Semantics for Nested Calls

- **Origin**: [call-protocol.md](crates/call/call-protocol.md), [operation-registry.md](crates/call/operation-registry.md)
- **Status**: resolved
- **Door type**: One-way (protocol schema), two-way (mechanism)
- **Priority**: high
- **Resolution**: `call.aborted` cascades to all non-terminal descendants in the call tree. The CallAdapter walks the tree (indexed by `parent_request_id` in `PendingRequestMap`) and sends `call.aborted` for each descendant. Default policy is `abort-dependents` (abort everything downstream); `continue-running` is an opt-in for long-running work that should survive a parent's abort. Handlers clean up via Rust's async drop semantics (future dropped → `Drop` guards release resources). The cascade is protocol-level (server discovers descendants and propagates); the mechanism (parent-indexed map, cancellation tokens, or a separate graph) is a two-way door. See ADR-016.
- **Cross-references**: ADR-012, ADR-015, ADR-016, [call-protocol.md](crates/call/call-protocol.md), [operation-registry.md](crates/call/operation-registry.md)

### OQ-18: Privilege Model and Authority Context

- **Origin**: [operation-registry.md](crates/call/operation-registry.md)
- **Status**: resolved
- **Door type**: One-way (ACL model), two-way (specific APIs)
- **Priority**: high
- **Resolution**: The `internal` flag on `OperationContext` marks calls that originated from composition (a handler calling another operation via `OperationEnv`), as opposed to external calls that arrived as `call.requested` from a wire client. The `internal` flag switches the authority context: the ACL check runs against the composing handler's identity (set at registration), not the caller's identity and not as a blanket skip. This replaces the previous `trusted` flag, which skipped ACL entirely — a privilege escalation vector. Operations have External/Internal visibility. Internal operations return `NOT_FOUND` when called from the wire and are excluded from `services/list`. The composition env is scoped — a handler can only invoke a declared set of operations. Handler identity is carried on `OperationContext` alongside caller identity (the principal/agent pair). See ADR-015.
- **Cross-references**: ADR-014, ADR-015, [call-protocol.md](crates/call/call-protocol.md), [operation-registry.md](crates/call/operation-registry.md)

### OQ-19: Session-Scoped Operation Registries and Agent-Written Operations

- **Origin**: [operation-registry.md](crates/call/operation-registry.md)
- **Status**: resolved
- **Door type**: Two-way (protocol doesn't need changes), one-way (if implementation closes the door)
- **Priority**: medium
- **Resolution**: The call protocol supports session-scoped registries through `OperationEnv` trait layering. No protocol changes needed. The pattern is documented here and in [operation-registry.md](crates/call/operation-registry.md) to prevent an implementation from accidentally closing it.

  The registry model has three tiers:

  | Tier | Scope | Lifetime | Visibility | Who populates it |
  |------|-------|----------|------------|-------------------|
  | Core (global) | All sessions | Process lifetime, static at startup | External + Internal (curated) | Assembly layer at startup |
  | Session | One session | Session lifetime, dynamic | Internal only (never wire-facing) | Agent during session (sandbox) |
  | Promotion | Session → Core | One-time transition | Manual/curated review | Human or architect agent reviews, then redeploys |

  Session-scoped operations are always `Internal` (ADR-015), run under the handler's identity (the agent handler that authorized the sandbox), can only compose operations in the handler's scoped env, and are ephemeral (gone when the session ends). Core operations are curated — reviewed before promotion. The promotion path is the curation checkpoint where autonomous (session-scoped) becomes curated (core). This is not auto-promotion.

  **Implementation guard**: `OperationEnv` must remain a trait, not a concrete type. A session-scoped env wraps the global env (check session registry first, fall through to global). Making `OperationEnv` concrete or hardcoding the global registry into the dispatch path would close this pattern. The static registration constraint (OQ-04) applies to the curated (Layer 0) registry only; session registries are dynamic by nature and are a different registry overlaying the curated one. **Generalized by ADR-024**: connection-scoped remote imports (`from_call`) use the same overlay mechanism as session-scoped ops. Both are per-scope dynamic overlays on the static curated base, composed into the per-call `OperationContext.env` by the `CallAdapter`. `OperationEnv` being a trait object (`Arc<dyn OperationEnv + Send + Sync>`) is what enables both overlay patterns.

  Session-scoped operations run in a locked-down sandbox (no direct net/fs/env access), can only reach operations in the handler's scoped env, and their output should be validated against their declared schema before returning. The promotion path requires review — an agent with a `promote` scope (the architect role) performs the promotion; the writing agent (lower-privileged role) requests it. This is the role-based escalation pattern (ADR-015): privileges escalate through a chain of command, not through direct authority.

  The agent-specific mechanism (quickjs sandbox, session registry lifecycle, promotion workflow) belongs to the agent crate spec. The call protocol's job is to keep the `OperationEnv` trait composable and the visibility/ACL model consistent across tiers.
- **Cross-references**: OQ-04, ADR-014, ADR-015, ADR-016, ADR-024, [operation-registry.md](crates/call/operation-registry.md)

## Theme: alknet-vault

### OQ-20: Salt/KDF and Encryption Key Derivation Method

- **Origin**: [encryption.md](crates/vault/encryption.md)
- **Status**: resolved
- **Door type**: One-way (key derivation method), two-way (salt field usage)
- **Priority**: high
- **Resolution**: The vault uses SLIP-0010 HD derivation from the BIP39 seed at path `m/74'/2'/0'/0'` to produce the AES-256-GCM encryption key — not PBKDF2. The `salt` field in `EncryptedData` is unused for key derivation (kept for wire-format compatibility with the TS predecessor). The TypeScript `@alkdev/storage` crypto module used PBKDF2 with a password + salt; data encrypted by that method (key_version=1) cannot be decrypted by the vault and must be migrated via one-time re-encryption to key_version=2. See ADR-020 for the full rationale and migration path.
- **Cross-references**: ADR-020, [encryption.md](crates/vault/encryption.md)

### OQ-21: Remote Vault Administration

- **Origin**: [service.md](crates/vault/service.md), [protocol.md](crates/vault/protocol.md), ADR-019
- **Status**: resolved
- **Door type**: One-way (vault crate is local-only by construction)
- **Priority**: medium
- **Resolution**: Remote vault access is **not a feature of the vault crate**. ADR-025 dropped irpc from the vault, making the vault local-only by construction — no `RemoteService` trait, no wire format for vault messages, no default-insecure remote handler. The vault's API is `VaultServiceHandle` (direct method calls), nothing else.

  If remote vault access is ever needed (e.g., the machine→worker pattern), it requires a **separate vault-server crate** that depends on both alknet-core (for `IdentityProvider`, scopes, auth-wrapping) and alknet-vault (for `VaultServiceHandle`). That crate would define its own threat model, access policy, operation filtering (Unlock/Lock local-only), and wire format — and requires its own ADR. This is a deliberate addition, not a flag flip on a default that was already loaded.

  The pre-ADR-025 deferral framed remote access as "non-breaking" (the wire format was additive). That framing was misleading: once workers build dependencies on the remote vault API, disabling it breaks them — the door is operationally one-way even if the wire format is additive. ADR-025 inverts the default: the vault is local-only by construction, and remote access requires building something new, not removing a default.

  Per-node vaults are the recommended pattern for multi-node deployments: each node has its own vault and mnemonic; credentials are encrypted *for* the receiving node's public key, not decrypted centrally. This is end-to-end encryption between nodes, matching ADR-008's "capability source" model.
- **Cross-references**: ADR-005, ADR-008, ADR-014, ADR-018, ADR-019, ADR-025, [protocol.md](crates/vault/protocol.md), [service.md](crates/vault/service.md)

### OQ-22: Key Rotation Mechanism

- **Origin**: [encryption.md](crates/vault/encryption.md)
- **Status**: resolved
- **Door type**: One-way (path scheme), two-way (rotation policy)
- **Priority**: medium
- **Resolution**: Key rotation uses version-indexed derivation paths. Each key version maps to a distinct SLIP-0010 path: `m/74'/2'/0'/{version-2}'`. v2 (current) is at `m/74'/2'/0'/0'`; v3 is at `m/74'/2'/0'/1'`; etc. The `decrypt` method derives the key at the path indicated by `encrypted.key_version` (not always at `PATHS::ENCRYPTION`). The `rotate` method decrypts with the old version's key and re-encrypts with the new version's key — no new mnemonic needed. The assembly layer or a migration tool iterates stored blobs and calls `rotate` on each; the vault does not self-rotate. Partial rotation is safe (old keys remain derivable). See ADR-021.
- **Cross-references**: ADR-020, ADR-021, [encryption.md](crates/vault/encryption.md), [service.md](crates/vault/service.md)

### OQ-23: Handler Identity Registration Path and Composition Authority

- **Origin**: [operation-registry.md](crates/call/operation-registry.md), [call-protocol.md](crates/call/call-protocol.md), ADR-015
- **Status**: resolved
- **Door type**: One-way (security model), two-way (bundle shape)
- **Priority**: high
- **Resolution**: ADR-015 said handler identity was "set at registration by the assembly layer" but the registration API (`register(spec, handler)`) had no place for it — meaning every internal call would check ACL against `None`, reproducing the escalation gap ADR-015 was written to close. ADR-022 resolves this with a registration bundle (`HandlerRegistration`) carrying `provenance`, `composition_authority` (replacing `handler_identity: Identity` — it's a declared authority bundle, not a peer identity), `scoped_env`, and `capabilities`. The dispatch path (`build_root_context` and `OperationEnv::invoke()`) reads from the bundle. Provenance determines which ops can compose: only `Local` and `Session` get composition authority; leaves (`FromOpenAPI`, `FromMCP`, `FromCall`) get `None` — they don't compose, so they don't need it. Capabilities are per-request on `OperationContext`, populated from the bundle (resolving the closure-capture vs context ambiguity). The kernel/user analogy: user's authority checked once at the External gate; handler's composition authority used for all composition inside; scoped env bounds reachability. No intersection — the user's authority does not limit internal calls. See ADR-022.
- **Cross-references**: ADR-014, ADR-015, ADR-022, docs/reviews/001-pre-implementation-architecture-sanity-check.md (C1–C4), [operation-registry.md](crates/call/operation-registry.md), [call-protocol.md](crates/call/call-protocol.md)

### OQ-24: Operation Error Schemas

- **Origin**: [operation-registry.md](crates/call/operation-registry.md), [call-protocol.md](crates/call/call-protocol.md), ADR-017
- **Status**: resolved
- **Door type**: One-way (wire format), two-way (mapping mechanism)
- **Priority**: high
- **Resolution**: `OperationSpec` gains `error_schemas: Vec<ErrorDefinition>` where each `ErrorDefinition` carries a `code`, `description`, `schema` (JSON Schema for the error detail payload), and optional `http_status` (for adapter projection). The `call.error` payload gains an optional `details` field carrying the typed error payload. Protocol-level codes (`NOT_FOUND`, `FORBIDDEN`, `INVALID_INPUT`, `INTERNAL`, `TIMEOUT`) are distinct from operation-level domain codes (`FILE_NOT_FOUND`, `RATE_LIMITED`, etc.) — protocol codes are emitted by the dispatch machinery, operation codes by handlers. `from_openapi`/`to_openapi` map OpenAPI response status codes to/from `ErrorDefinition`s, making the adapter contract from ADR-017 faithful on the error axis. `services/schema` exposes `error_schemas` for client code generation. See ADR-023.
- **Cross-references**: ADR-017, ADR-023, docs/reviews/001-pre-implementation-architecture-sanity-check.md (C5), [operation-registry.md](crates/call/operation-registry.md), [call-protocol.md](crates/call/call-protocol.md)

## Theme: Call Client and Adapters

These open questions are the remainders from the call-completion gap analysis
(`docs/research/alknet-call-completion/gap-analysis.md`, DC-1..4) and the
peer-graph routing research (`docs/research/alknet-call-peer-routing/findings.md`).
ADR-029 supersedes ADR-028 and dissolves OQ-25 and the cross-peer half of
OQ-28; the remaining two-way-door shape/defaults are recorded in
[client-and-adapters.md](crates/call/client-and-adapters.md) and may be
revisited during implementation without a new ADR.

### OQ-25: ~~Remote-Safe Marking Shape for CallClient Peer-Scoped Filtering~~ (Dissolved by ADR-029)

- **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 (§1 Consequences), ADR-028
- **Status**: **dissolved** (ADR-029)
- **Door type**: ~~Two-way (shape only — existence is one-way, resolved by ADR-028)~~
- **Priority**: ~~medium~~
- **Resolution**: **Dissolved by [ADR-029](decisions/029-peer-graph-routing-model.md).**
  ADR-028's `remote_safe: bool` / `trusted_peer` model is superseded — it was a
  parallel, weaker authorization system that duplicated the existing
  `AccessControl`/`Identity` machinery. ADR-029 retires `remote_safe`/
  `trusted_peer` entirely; peer authorization flows through
  `AccessControl::check(peer_identity)`. The op's `AccessControl` *is* the
  peer-authorization policy — there is no separate marking. Per-peer
  differentiation is via `IdentityProvider` config (different peers get
  different scopes), not a per-op boolean. The "shape" question is moot
  because there is no marking to shape. See ADR-029 §3.
- **Cross-references**: ADR-009, ADR-014, ADR-015, ADR-017, ADR-022, ADR-024,
  ~~ADR-028~~ (superseded), ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md),
  [operation-registry.md](crates/call/operation-registry.md)

### OQ-26: OperationAdapter Error Type (AdapterError Variants)

- **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 §5, [ADR-029](decisions/029-peer-graph-routing-model.md) §5
- **Status**: **resolved** (2026-06-27)
- **Door type**: Two-way
- **Priority**: medium
- **Resolution**: The `AdapterError` enum is `#[non_exhaustive]` +
  `thiserror::Error`, with these v1 variants:
  - `DiscoveryFailed { message: String }` — `from_call` remote unreachable / `services/list` failed
  - `SchemaParse { message: String }` — `from_openapi` / `from_jsonschema` couldn't parse the spec
  - `Transport { message: String }` — underlying transport error (QUIC for `from_call`, HTTP for `from_openapi`/`from_mcp`)
  - `Unauthorized { message: String }` — HTTP 401 for `from_openapi`/`from_mcp`, auth rejected for `from_call`
  - `SamePeerCollision { message: String }` — namespace collision *within a single peer* (ADR-029 §5: cross-peer collision dissolves; same-peer collision stays an error). Replaces the flat `Conflict` variant from the pre-ADR-029 implementation.

  `#[non_exhaustive]` lets `alknet-http`'s adapters extend without breaking
  match arms. The variant payloads are `String` messages — kept simple and
  `Send + Sync` by construction. This matches the shipped implementation
  (`crates/alknet-call/src/client/mod.rs`) except `Conflict` →
  `SamePeerCollision` (the ADR-029 migration renames it). Two-way door:
  adding variants later is non-breaking; renaming a variant is a match-arm
  update but not an architectural change.
- **Cross-references**: ADR-017, ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md)

### OQ-27: from_call Re-Import Trigger

- **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 Assumption 4
- **Status**: open
- **Door type**: Two-way
- **Priority**: low
- **Resolution**: ADR-017 Assumption 4 noted re-import "happens on
  reconnection or is triggered explicitly." The v1 default is
  **auto-re-import on connection establishment**. The overlay is
  per-connection (Layer 2, ADR-024), so a stale overlay dies with the
  connection; re-import on reconnect is naturally scoped to the new
  connection. This is the right default for the runner pattern (a worker
  reconnects → the hub re-discovers the worker's ops automatically).
  Explicit re-import via a future `CallConnection::refresh()` method is
  additive and can be added if a deployment needs manual control. Reversal
  is cheap; no ADR needed.
- **Cross-references**: ADR-017, ADR-024, [client-and-adapters.md](crates/call/client-and-adapters.md)

### OQ-28: from_call Namespace Collision Behavior

- **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 §3
- **Status**: open
- **Door type**: Two-way
- **Priority**: low
- **Resolution**: ADR-017 §3's `FromCallConfig` namespace prefix is
  **optional, default no prefix, collision = error**. A node importing from
  two remotes that both expose `/container/exec` without prefixes should fail
  loudly rather than silently overwrite. The operator adds prefixes when they
  know they're importing from multiple sources. This matches the
  default-deny, explicit-allow posture (ADR-015, ADR-028). Reversal is cheap;
  no ADR needed. The alternative (last-wins) would silently mask one
  remote's op behind another's, which is the kind of surprise the
  default-deny posture exists to avoid.

  **Cross-peer collision dissolved by ADR-029.** Under the peer-keyed overlay
  model, same name on different peers is fine — they live in separate
  peer sub-overlays, no collision, no prefix needed. The collision rule now
  stays only *within* a peer (same name on the same peer is still an error —
  a peer shouldn't expose two ops with the same name). `FromCallConfig::namespace_prefix`
  becomes optional local-naming sugar, not the disambiguation mechanism. See
  ADR-029 §5.
- **Cross-references**: ADR-015, ADR-017, ~~ADR-028~~ (superseded), ADR-029,
  [client-and-adapters.md](crates/call/client-and-adapters.md)

### OQ-29: CallClient TLS Client-Auth and Remote-Identity Verification

- **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 §7
- **Status**: open
- **Door type**: Two-way
- **Priority**: medium
- **Resolution**: v1 `CallClient::connect()` builds the quinn client config
  with `with_no_client_auth()` and an `AcceptAnyServerCertVerifier` — the
  client does not present its TLS identity (`credentials.tls_identity`) as a
  client cert, and does not pin the remote's expected identity from
  `credentials.remote_identity`. The server-side
  `AcceptAnyCertVerifier` (in alknet-core's endpoint) does not require or
  verify client certs, so a client cert is not needed to establish a
  connection in v1. Wiring the local node's RawKey/X509 identity as a rustls
  client-auth cert (for servers that *do* verify client identity) and
  plugging `credentials.remote_identity` into a real `ServerCertVerifier` is
  additive — a two-way-door remainder surfaced during implementation.
  **The one-way constraint (credentials from `Capabilities`, not env vars,
  ADR-014) is unaffected**: the `auth_token` dimension flows through the
  call-protocol `auth_token` payload field, not TLS, so the no-env-vars
  invariant holds independently of this gap. Decided during a future task that
  wires RawKey client-auth; recorded here, not in a full ADR.
- **Cross-references**: ADR-014, ADR-017, ADR-027, [client-and-adapters.md](crates/call/client-and-adapters.md), [endpoint.md](crates/core/endpoint.md)

### OQ-30: PeerRef::Any Routing Policy

- **Origin**: [ADR-029](decisions/029-peer-graph-routing-model.md) §2, [client-and-adapters.md](crates/call/client-and-adapters.md), `docs/research/alknet-call-peer-routing/findings.md` §3.2
- **Status**: open
- **Door type**: Two-way
- **Priority**: low
- **Resolution**: v1 `PeerRef::Any` uses insertion-order first-match —
  deterministic but order-dependent (worker A connects before worker B → `Any`
  routes to A until A disconnects). This is the simplest routing policy and is
  correct for the immediate use case (the head picks the first worker that
  serves the op). A richer `RoutingPolicy` (round-robin, least-loaded,
  affinity) is the two-way-door remainder; the `PeerRef` enum is designed to
  compose with a `Route { selector, policy }` struct without breaking the
  `invoke_peer` signature. Decided during implementation when a fan-out use
  case needs it; recorded here, not in a full ADR.
- **Cross-references**: ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md)

### OQ-31: services/list-peers Re-Export Semantics

- **Origin**: [ADR-029](decisions/029-peer-graph-routing-model.md) §6, `docs/research/alknet-call-peer-routing/findings.md` §3.5
- **Status**: open
- **Door type**: Two-way
- **Priority**: low
- **Resolution**: v1 defaults to "own ops only" — `services/list` shows the
  head's own Layer 0 `External` ops, filtered by `AccessControl::check(calling_peer)`,
  unchanged from today (minus the `remote_safe` filter). A `services/list-peers`
  opt-in (new built-in operation) lists the peer overlays with attribution:
  each peer's sub-overlay listed as `{ peer: Option<PeerId>, operations: [...] }`,
  filtered by the calling peer's authorization. Whether re-exported peer ops
  are listed by default, opt-in, or per-peer-policy is the two-way-door
  remainder; v1 is opt-in (`services/list-peers`). The re-export policy is an
  `AccessControl` decision on the listing op. Decided during implementation
  when a consumer needs peer-attributed discovery; recorded here, not in a
  full ADR.
- **Cross-references**: ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md)

### OQ-32: Multi-Hop Federation

- **Origin**: [ADR-029](decisions/029-peer-graph-routing-model.md) §3.7, `docs/research/alknet-call-peer-routing/findings.md` §3.7
- **Status**: open
- **Door type**: One-way (federation model), two-way (mechanism)
- **Priority**: low
- **Resolution**: v1 is one-hop — worker A does not transitively see worker
  B's ops through the head unless the head explicitly re-exports them. The
  peer-keyed overlay model extends to multi-hop without redesign (a chain of
  `PeerRef::Specific` routing decisions), but path-finding (which peer reaches
  which op transitively) is where a graph library (petgraph) would pay off.
  For v1 (one hop, shallow), a nested `HashMap<PeerId, HashMap<String, ...>>`
  suffices. Whether multi-hop federation becomes a real use case is a future
  decision; the peer-keyed model does not foreclose it. Not designed; tracked
  here so the v1 model's extendability is recorded.
- **Cross-references**: ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md)

### OQ-33: PeerId — Cryptographic Identity vs Stable Logical Identifier

- **Origin**: [ADR-029](decisions/029-peer-graph-routing-model.md) Assumption 1, `docs/research/alknet-call-peer-routing/findings.md` §6.1
- **Status**: **resolved** (2026-06-27)
- **Door type**: One-way (composition semantics), two-way (id source)
- **Priority**: high
- **Resolution**: `PeerId` is a **logical identifier, decoupled from the
  cryptographic identity**. It is *not* `Identity.id` (the TLS fingerprint or
  API-key prefix) — those change on key rotation, which would break every
  in-flight `PeerRef::Specific` and every ACL entry referencing that peer.

  **v1 source**: connection-assigned UUID (v4) at `connect()`/`accept()` time.
  Stable for the connection's lifetime; changes on reconnect. This is a
  **no-storage workaround** — the project has deliberately avoided a DB
  backend for the core crates (smaller, fewer deps, simpler testing), which
  has served the local-only crates (vault, registry) well. But peer identity
  is the first *cross-node* state that wants persistence: what we actually
  want is a persistent mapping from a logical peer identity to its current
  cryptographic material, updated on key rotation, surviving restarts.
  Without a DB, the UUID is the least-bad ephemeral option — the failure
  mode (in-flight `PeerRef::Specific` gets `NOT_FOUND` on reconnect) is
  acceptable for v1, and the re-`from_call` produces a fresh `PeerRef`.

  **The real solution (future, tracked as OQ-34):** a persistent peer
  registry — a mapping from a stable logical peer identity (configured node
  name or registered identity) to its current cryptographic material,
  persisted across restarts and key rotations. This is what makes the
  ACL-stability concern below work correctly: the ACL entry keys on the
  logical name, the peer registry tracks the current crypto identity for
  that name, and key rotation becomes a vault-only operation with no ACL
  update on the remote side. The no-DB posture of the core crates means
  this registry lives outside the core — likely in a service crate or an
  assembly-layer store — not in alknet-call itself. See OQ-34.

  **Key-rotation / ACL note (context for the future, not a v1 decision):**
  if `PeerId` were the fingerprint, rotating a node's TLS key would change
  its `PeerId`, invalidating every ACL entry that references that peer. The
  vault makes local key rotation easy (derive a new key, re-encrypt,
  ADR-021); the problem is the *remote* side's ACL — the hub's
  `authorized_fingerprints` / `AccessControl` entries that reference the old
  fingerprint. Decoupling `PeerId` from the crypto material means the ACL
  entry *can* persist across key rotation — but only if there's a store that
  maps the logical name to the new crypto identity after rotation. That
  store is OQ-34. The v1 decision (logical id, not crypto; UUID source)
  keeps the door open for it without requiring it now.

  **The one-way door:** `PeerId` is a logical id, not `Identity.id`. This
  determines the `PeerCompositeEnv` key type, the `PeerRef::Specific`
  payload type, and the `ScopedPeerEnv.peer_pinned` entry shape. Reversing
  it (switching to `Identity.id`) would break the peer-keyed overlay, the
  routing selector, and the reachability set simultaneously. The *source* of
  the logical id (UUID now, peer registry later) is the two-way-door
  remainder — switching from UUID to a persistent registry changes the
  id-generation path, not the composition model.
- **Cross-references**: ADR-009, ADR-014, ADR-015, ADR-017, ADR-021, ADR-027,
  ADR-029, OQ-34, [client-and-adapters.md](crates/call/client-and-adapters.md),
  [operation-registry.md](crates/call/operation-registry.md),
  [auth.md](crates/core/auth.md)

### OQ-34: Persistent Peer Registry (Cross-Node State Storage)

- **Origin**: OQ-33 (the storage dimension it surfaced), the no-DB posture of ADR-008/018/025
- **Status**: open
- **Door type**: One-way (storage boundary), two-way (backend choice)
- **Priority**: medium (not a v1 blocker — UUID works for v1; becomes real
  when key rotation across nodes or peer-attribution persistence matters)
- **Resolution**: The core crates (alknet-core, alknet-call, alknet-vault)
  are deliberately storage-free — no DB, no persistence layer, in-memory
  state only. This has kept the core small and testable, and it works for
  local-only state (vault key rotation is version-indexed paths, no DB
  needed, ADR-021). **Peer identity is the first cross-node state that
  wants persistence**: a stable logical peer identity mapped to its current
  cryptographic material, surviving restarts and key rotations. The v1
  workaround (OQ-33: connection-assigned UUID) is ephemeral — it works for
  the immediate use case (head→workers, operator-controlled, reconnects
  produce a fresh UUID) but doesn't support ACL entries that persist across
  key rotation, because there's nowhere to store "worker-a's current crypto
  identity is X."

  **What this OQ tracks (not designed, not a v1 decision):**
  - Whether a persistent peer registry belongs in a service crate (e.g., an
    `alknet-registry` or `alknet-peer-store`), in the assembly layer (a
    SQLite file the binary owns), or as a new alknet-core abstraction
    (a `PeerRegistry` trait with no built-in impl, like `IdentityProvider`).
  - Whether the no-DB posture extends to "core has a trait, service has the
    impl" (the `IdentityProvider` pattern) or stays "core is storage-free,
    persistence is entirely outside the crate graph."
  - The backend choice (SQLite, a key-value store, a config file) is the
    two-way-door remainder; the *storage boundary* (does core know about
    persistence at all?) is the one-way door.

  **Why this is a one-way door on the storage boundary, not a two-way door:**
  if core gains a `PeerRegistry` trait, downstream crates depend on it and
  the trait shape becomes a contract. If core stays storage-free, the
  registry lives in a service crate and core never knows about persistence.
  Reversing either direction breaks downstream consumers. The decision
  should be made when a concrete use case (key rotation across nodes,
    durable peer attribution, multi-hop federation with OQ-32) forces it —
  not before.

  **Not a v1 blocker.** The UUID works for v1; this OQ exists so the
  no-DB posture's limit is tracked and the decision is made deliberately
  when it's needed, not accidentally when someone bolts a SQLite file onto
  the assembly layer and it becomes load-bearing.
- **Cross-references**: ADR-008, ADR-018, ADR-021, ADR-025, ADR-029, OQ-33,
  [auth.md](crates/core/auth.md), [config.md](crates/core/config.md)