Files

glm-5.2 99c6dd9483 docs(arch): resolve OQ-26 (AdapterError variants) + OQ-33 (PeerId = logical id) + OQ-34 (persistent peer registry)

OQ-26 (resolved): AdapterError variants decided — DiscoveryFailed,
SchemaParse, Transport, Unauthorized, SamePeerCollision (replaces flat
Conflict per ADR-029 §5). #[non_exhaustive] for downstream extension.
Two-way door; the initial set is the code's return type.

OQ-33 (resolved): PeerId is a logical identifier, NOT Identity.id. The
research's v1 default (PeerId = fingerprint) is overridden: coupling PeerId
to crypto material breaks every in-flight PeerRef::Specific and every ACL
entry on key rotation. v1 source is a connection-assigned UUID — a
no-storage workaround that works for the immediate use case (head→workers,
reconnect produces fresh PeerRef, in-flight gets NOT_FOUND which is correct).
The one-way door: PeerId is logical, not crypto — this determines
PeerCompositeEnv key type and PeerRef::Specific payload. The id source
(UUID vs configured name vs peer registry) is the two-way-door remainder.

OQ-34 (new): the storage dimension OQ-33 surfaced. The core crates are
deliberately DB-free (smaller, fewer deps, simpler testing) — this served
local-only state (vault, registry) well, but peer identity is the first
cross-node state that wants persistence. The real solution (a persistent
peer registry mapping stable logical name → current crypto material,
surviving key rotation) is not a v1 blocker (UUID works), but tracked so the
no-DB posture's limit is deliberate, not accidental. The storage boundary
(core gets a PeerRegistry trait vs stays storage-free) is the one-way door;
the backend choice is two-way. Key-rotation/ACL note: decoupling PeerId from
crypto keeps the door open for ACL entries that persist across key rotation
— when the peer registry is built, ACLs key on the logical name and key
rotation becomes vault-only with no remote-side ACL update.

2026-06-27 06:34:35 +00:00

48 KiB

Raw Blame History

status, last_updated

status	last_updated
draft	2026-06-26

Open Questions

Questions are organized by theme. Each question has a stable OQ-ID for cross-referencing from spec documents.

Door type classifications follow ADR-009:

One-way door: Reversal requires rewriting significant code or permanently closes a capability. Requires ADR before implementation.
Two-way door: Reversal is cheap or additive. Can be decided during implementation.

Theme: Core Types

OQ-01: BiStream Type Definition

Origin: overview.md
Status: resolved
Door type: One-way
Priority: high
Resolution: BiStream is a trait (AsyncRead + AsyncWrite + Send + Unpin). Handlers receive a Connection (not a single BiStream). This preserves the WASM door — browser clients can implement BiStream over WebTransport streams. See ADR-007.
Cross-references: ADR-002, ADR-007, ADR-009

OQ-02: AuthContext Resolution Timing

Origin: overview.md
Status: resolved
Door type: One-way
Priority: high
Resolution: Hybrid model (Option C) — endpoint resolves what it can (e.g., TLS client certificate), handler resolves what it must (e.g., AuthToken in first frame). AuthContext may be partial when handle() is called. See ADR-004.
Cross-references: ADR-002, ADR-004

Theme: ALPN and Routing

OQ-03: ALPN String Naming Convention

Origin: overview.md
Status: resolved
Door type: One-way
Priority: medium
Resolution: Custom ALPNs use alknet/<name> prefix (no version), standard ALPNs use IANA strings. No version negotiation initially. See ADR-006.
Cross-references: ADR-001, ADR-006

OQ-04: Dynamic Handler Registration at Runtime vs Static at Startup

Origin: overview.md
Status: resolved
Door type: Two-way
Priority: low
Resolution: Static registration at startup. HandlerRegistry is immutable after construction. ALPN strings in the TLS ServerConfig are derived from the registry at startup — adding a handler at runtime requires rebuilding the TLS config. The ArcSwap<HandlerRegistry> pattern can be applied later if needed (two-way door). See ADR-010.

Scope clarification (ADR-024): This resolution applies to the HandlerRegistry (ALPN string → ProtocolHandler), which is what ADR-010 governs. The call protocol's OperationRegistry (operation name → HandlerRegistration) is a separate registry living inside the CallAdapter, behind the single ALPN alknet/call. Its mutability profile is governed by ADR-024, not by this OQ. ADR-024 layers the operation registry by trust boundary: curated Local ops are immutable (same rationale as here — composing ops are privileged, the startup trust boundary is where their authority is granted); Session and imported (FromCall etc.) ops are dynamic at their respective trust-boundary scopes (session, connection). The pre-ADR-024 blanket immutability claim in operation-registry.md was inherited by analogy from this OQ and did not actually apply — the TLS-config argument that justifies HandlerRegistry immutability does not touch the OperationRegistry.
Cross-references: ADR-001, ADR-010, ADR-024, endpoint.md, operation-registry.md

Theme: Transport and Endpoint

OQ-05: Multi-Connectivity Endpoint

Origin: overview.md
Status: resolved
Door type: One-way
Priority: high
Resolution: AlknetEndpoint supports both quinn::Endpoint (public QUIC+TLS) and iroh::Endpoint (P2P relay-assisted) simultaneously, both optional and feature-gated. Both produce QUIC connections that dispatch through the same HandlerRegistry by ALPN string. These are not interchangeable transports — they serve fundamentally different deployment contexts (public IP vs NAT traversal). TCP is not an endpoint concern — bare TCP SSH is handled by the SSH handler directly. See ADR-010.
Cross-references: ADR-001, ADR-010, endpoint.md

OQ-06: Server-Side ALPN vs Client-Side ALPN

Origin: ADR-001
Status: resolved
Door type: One-way
Priority: low
Resolution: One ALPN per connection. Clients open one QUIC connection per ALPN. QUIC connections are cheap (multiplexed over the same UDP flow). See ADR-006.
Cross-references: ADR-001, ADR-006

Theme: Call Protocol

OQ-07: Call Protocol Scope Within a Connection

Origin: ADR-005
Status: resolved
Door type: Two-way
Priority: medium
Resolution: The call protocol uses bidirectional QUIC streams with EventEnvelope framing and ID-based correlation via PendingRequestMap. The protocol is stream-agnostic — the client can open one stream per operation, multiplex on one stream, or any mix. Correlation is by request ID, not by stream. Both sides can initiate calls. One alknet/call connection gives access to the full operation registry (call, subscribe, batch, schema). No multiplexing layer is needed inside the connection. See ADR-012.
Cross-references: ADR-005, ADR-012

Theme: Security

OQ-08: Vault Integration Point

Origin: overview.md
Status: resolved
Door type: One-way
Priority: medium
Resolution: CLI-embedded, assembly-layer only. The CLI binary instantiates VaultServiceHandle locally at startup, derives and decrypts the credentials each handler needs, and injects them into handler capabilities. alknet-vault has no ALPN, no alknet-core dependency, and no operations registered in the call protocol. The master seed and derived private keys never cross the network. The vault is a capability source, not a network service. See ADR-008 and ADR-014.
Cross-references: ADR-003, ADR-005, ADR-008, ADR-014

Deferred Questions

These questions are acknowledged but not active. They will be promoted to open when their crate is being specified.

OQ-09: WASM Target Boundaries

Origin: overview.md
Status: deferred
Door type: One-way (when applicable)
Priority: low
Resolution: Not an active question — WASM compatibility is a design constraint (see ADR-009, overview.md design principles), not a deliverable. Specific WASM targeting decisions will be made when individual crates are implemented. BiStream being a trait preserves the client-side stream door — a browser can implement BiStream over WebTransport streams. The server-side dispatch door is NOT preserved by ADR-007 and is a known, accepted closure: Connection is a concrete quinn-bound struct (not a trait), the accept loop uses tokio::spawn (tokio does not run on WASM), and the call-protocol dispatch internals (PendingRequestMap, CallAdapter) use tokio oneshot/mpsc channels. A WASM server-side peer would require a Connection trait and a runtime-abstracted accept loop — not planned. The browser path is client-side via a JS SDK, not server-side Rust-to-WASM. This is an explicit one-way door, not an oversight.
Cross-references: ADR-007, ADR-009

OQ-10: Git Adapter Scope — Smart Protocol Only or Full Server?

Origin: overview.md
Status: deferred
Door type: Two-way
Priority: low
Resolution: Deferred per the cleanup plan. Start with git smart protocol over QUIC streams. ERC721 integration and full server capabilities are additive. Composability fork (review #002 W18): whether git operations are registered in the OperationRegistry and callable via env.invoke(), or only available as raw smart protocol on alknet/git, is a separate decision from ERC721 scope. The path of least resistance (raw smart protocol only) forecloses agent composition of git operations — an agent handler that wants to compose git/clone cannot, because there's no OperationSpec, no Handler, no registration. To make git composable, a call-protocol projection (a set of HandlerRegistration bundles wrapping git operations behind the registry) must be built alongside or instead of the raw handler. Resolve this when speccing alknet-git, not deferred past it.
Cross-references: ADR-001

Theme: alknet-core

OQ-11: Handler-Level Auth Resolution Observability

Origin: auth.md
Status: resolved
Door type: Two-way
Priority: medium
Resolution: Option B — handlers store resolved identity on the Connection. When a handler resolves identity inside handle() (the handler-level auth phase), it calls connection.set_identity(identity) to store the resolved Identity on the connection object. The endpoint and observability layers can read it later for connection logging, audit trails, and metrics.

Why not Option A (return identity from handle()): it changes the ProtocolHandler trait signature for all handlers, even those that don't do auth resolution (DNS, health check). It also assumes one identity per connection — but the call protocol can have different identities per request on the same connection (one connection, multiple call.requested events with different auth tokens). Returning a single identity from handle() would be misleading for the call protocol.

Why not Option C (identity stays local): the resolved identity is useful beyond the handler. The endpoint may want to log "connection from X authenticated as Y." A connection-level observability layer needs the identity. If it stays local, every handler that resolves identity would need to duplicate logging logic, and the endpoint can't correlate connections to identities.

Two identity scopes exist and must not be conflated:
- Connection-level identity (this decision): set once by the handler in handle(), stored on Connection, read by the endpoint for logging/observability. This is the "connection owner" — who opened this QUIC connection.
- Per-request identity (already in the call protocol spec): set per call.requested by the CallAdapter, stored on OperationContext.identity. This is the "call caller" — who is making this specific call, which may upgrade mid-session (different auth tokens on the same connection).
Both exist. The connection-level identity is the stable "who is this connection from"; the per-request identity is the dynamic "who is this specific call from." The call protocol's per-request resolution (which may produce a different identity than the connection-level resolution) takes precedence for ACL on OperationContext — the connection-level identity is for observability only, not for ACL.

C13 resolution (review #002): the endpoint does not read identity() after handle() returns. The Connection is moved into the spawned handler task (endpoint.md), so the endpoint no longer has a reference to it. Connection-level observability (remote addr, ALPN, connection ID) is logged by the endpoint before the move. Identity-level observability is logged by the handler (the handler knows which identity it resolved and can log it). There is no Arc<Connection> sharing or channel-based identity-reporting mechanism — the simplest honest answer that avoids over-engineering the observability path before there's a demonstrated need. If a future use case requires the endpoint to correlate connections to identities, an Arc<Connection> or a side-channel can be added then.
Cross-references: ADR-004, ADR-011, ADR-015 (per-request identity on OperationContext), auth.md

OQ-12: TLS Identity Provisioning in AlknetEndpoint

Origin: endpoint.md, config.md
Status: resolved
Door type: One-way
Priority: high
Resolution: TLS identity in alknet has two distinct use cases, not one:

Use case 1 — P2P / key-based identity (default for most alknet nodes): RFC 7250 raw Ed25519 public keys. No domain, no CA, no cert renewal. The Ed25519 public key IS the node's identity. This is the same model iroh uses with its NodeId. It works natively with SSH auth (same key type) and git (SSH key-based auth). TlsIdentity::RawKey(Ed25519SecretKey) in StaticConfig covers this. As of ADR-027, RawKey uses ed25519_dalek::SigningKey (via an alknet-core wrapper), not iroh::SecretKey — so raw-key TLS identity is available in quinn-only builds without the iroh feature.

Use case 2 — Domain-hosted services (relays, public-facing nodes): X.509 certificates with domain names. Required for browser/WebTransport clients, which don't support RFC 7250. This has two sub-cases:
- Manual: Provide cert/key file paths via TlsIdentity::X509. Already specified in StaticConfig.
- ACME auto-provisioning: Let's Encrypt via rustls-acme. TlsIdentity::Acme { domains, cache_dir, directory, contact } carries static config; the endpoint constructs the AcmeState async state machine at setup time. Feature-gated behind acme. Designed in ADR-027. The reverse-proxy project (/workspace/@alkdev/reverse-proxy) demonstrates the proven pattern: AcmeConfig, ResolvesServerCertAcme, TLS-ALPN-01 challenge handling, automatic renewal.
Browser constraint: Browsers require X.509 and don't support RFC 7250. For browser/WebTransport clients, domain-hosted nodes with X.509 certs are mandatory. All other clients (SSH, git, alknet-native) work with raw keys by default.

The TlsIdentity enum in StaticConfig captures all four modes (X509, RawKey, SelfSigned, Acme). ADR-027 records the design decisions for ACME integration and RawKey decoupling.
Cross-references: ADR-010, ADR-027, config.md, endpoint.md

OQ-13: Operation Path Format and Routing Scope

Origin: operation-registry.md
Status: resolved
Door type: Two-way
Priority: medium
Resolution: alknet-call uses /{service}/{op} (e.g., /fs/readFile, /agent/chat, /services/list). This is the correct format for the alknet-call crate — it is not a "Phase 1 simplification" but the right design for this architecture. The /{node}/{service}/{op} pattern from the reference implementation served a head/worker routing model that is a separate architectural concern. Remote dispatch (federation / node-level routing) would be a different mechanism at a different layer, not a prefix added to alknet-call's operation paths. If remote dispatch is ever needed, it would be addressed by a separate crate or a routing layer above the operation registry, not by changing alknet-call's path format. Two-way door — the path format can be extended later if needed, but /{service}/{op} is the correct design now.
Cross-references: ADR-005, ADR-012

OQ-14: Batch Operation Semantics

Origin: call-protocol.md
Status: resolved
Door type: Two-way
Priority: low
Resolution: Batch is a client-side pattern — multiple call.requested events with correlated IDs, responses arrive independently. This is the correct protocol design, not a simplification to be "upgraded" later. QUIC's stream multiplexing already provides the concurrency and ordering guarantees that batch would need. Batch-specific event types (e.g., batch.requested, batch.responded) would add protocol complexity without clear benefit over sending multiple call.requested events. If a compelling use case for atomic batch semantics emerges, it can be added as a new event type without breaking existing clients. Two-way door.
Cross-references: ADR-012

Theme: alknet-call

OQ-15: Call Protocol Client and Adapter Contract

Origin: call-protocol.md, operation-registry.md, ADR-013
Status: resolved
Door type: One-way
Priority: high
Resolution: CallClient opens QUIC connections and shares the dispatch loop with CallAdapter — both sides can send and receive call.requested once connected. Connection direction (who opened the connection) is independent of call direction (who calls whom). from_call adapter discovers remote operations via services/list + services/schema and registers them with forwarding handlers — same pattern as from_openapi and from_mcp. to_openapi and to_mcp project local operations to external protocols. Adapter contract trait (OperationAdapter) produces (OperationSpec, Handler) pairs. Cross-node call tree: abort cascade (ADR-016) propagates across node boundaries through from_call handlers. Credentials for connections come from capabilities (ADR-014). Adapter-registered operations are Internal by default (ADR-015). See ADR-017.
Cross-references: ADR-005, ADR-013, ADR-014, ADR-015, ADR-016, ADR-017, call-protocol.md, operation-registry.md

OQ-16: Safe Vault Operations for Call Protocol Exposure

Origin: operation-registry.md, ADR-008
Status: resolved
Door type: One-way
Priority: high
Resolution: No vault operations are exposed over the call protocol for now. The vault is accessed only at the assembly layer (CLI binary at startup). Handlers receive secret material through OperationContext.capabilities, not by calling vault operations over the wire. The operation-registry.md spec previously showed vault/derive, vault/unlock, and vault/decrypt registered as call protocol operations — that was a contradiction with ADR-008's "capability source" model and has been corrected. If a future use case requires exposing a vault operation over the call protocol (e.g., a restricted vault/public-key operation that returns only public key material for identity verification), it would require its own ADR with an explicit threat model justification. See ADR-014.
Cross-references: ADR-008, ADR-014, operation-registry.md

OQ-17: Abort Cascade Semantics for Nested Calls

Origin: call-protocol.md, operation-registry.md
Status: resolved
Door type: One-way (protocol schema), two-way (mechanism)
Priority: high
Resolution: call.aborted cascades to all non-terminal descendants in the call tree. The CallAdapter walks the tree (indexed by parent_request_id in PendingRequestMap) and sends call.aborted for each descendant. Default policy is abort-dependents (abort everything downstream); continue-running is an opt-in for long-running work that should survive a parent's abort. Handlers clean up via Rust's async drop semantics (future dropped → Drop guards release resources). The cascade is protocol-level (server discovers descendants and propagates); the mechanism (parent-indexed map, cancellation tokens, or a separate graph) is a two-way door. See ADR-016.
Cross-references: ADR-012, ADR-015, ADR-016, call-protocol.md, operation-registry.md

OQ-18: Privilege Model and Authority Context

Origin: operation-registry.md
Status: resolved
Door type: One-way (ACL model), two-way (specific APIs)
Priority: high
Resolution: The internal flag on OperationContext marks calls that originated from composition (a handler calling another operation via OperationEnv), as opposed to external calls that arrived as call.requested from a wire client. The internal flag switches the authority context: the ACL check runs against the composing handler's identity (set at registration), not the caller's identity and not as a blanket skip. This replaces the previous trusted flag, which skipped ACL entirely — a privilege escalation vector. Operations have External/Internal visibility. Internal operations return NOT_FOUND when called from the wire and are excluded from services/list. The composition env is scoped — a handler can only invoke a declared set of operations. Handler identity is carried on OperationContext alongside caller identity (the principal/agent pair). See ADR-015.
Cross-references: ADR-014, ADR-015, call-protocol.md, operation-registry.md

OQ-19: Session-Scoped Operation Registries and Agent-Written Operations

Origin: operation-registry.md
Status: resolved
Door type: Two-way (protocol doesn't need changes), one-way (if implementation closes the door)
Priority: medium

Resolution: The call protocol supports session-scoped registries through OperationEnv trait layering. No protocol changes needed. The pattern is documented here and in operation-registry.md to prevent an implementation from accidentally closing it.

The registry model has three tiers:

Tier	Scope	Lifetime	Visibility	Who populates it
Core (global)	All sessions	Process lifetime, static at startup	External + Internal (curated)	Assembly layer at startup
Session	One session	Session lifetime, dynamic	Internal only (never wire-facing)	Agent during session (sandbox)
Promotion	Session → Core	One-time transition	Manual/curated review	Human or architect agent reviews, then redeploys

Session-scoped operations are always Internal (ADR-015), run under the handler's identity (the agent handler that authorized the sandbox), can only compose operations in the handler's scoped env, and are ephemeral (gone when the session ends). Core operations are curated — reviewed before promotion. The promotion path is the curation checkpoint where autonomous (session-scoped) becomes curated (core). This is not auto-promotion.

Implementation guard: OperationEnv must remain a trait, not a concrete type. A session-scoped env wraps the global env (check session registry first, fall through to global). Making OperationEnv concrete or hardcoding the global registry into the dispatch path would close this pattern. The static registration constraint (OQ-04) applies to the curated (Layer 0) registry only; session registries are dynamic by nature and are a different registry overlaying the curated one. Generalized by ADR-024: connection-scoped remote imports (from_call) use the same overlay mechanism as session-scoped ops. Both are per-scope dynamic overlays on the static curated base, composed into the per-call OperationContext.env by the CallAdapter. OperationEnv being a trait object (Arc<dyn OperationEnv + Send + Sync>) is what enables both overlay patterns.

Session-scoped operations run in a locked-down sandbox (no direct net/fs/env access), can only reach operations in the handler's scoped env, and their output should be validated against their declared schema before returning. The promotion path requires review — an agent with a promote scope (the architect role) performs the promotion; the writing agent (lower-privileged role) requests it. This is the role-based escalation pattern (ADR-015): privileges escalate through a chain of command, not through direct authority.

The agent-specific mechanism (quickjs sandbox, session registry lifecycle, promotion workflow) belongs to the agent crate spec. The call protocol's job is to keep the OperationEnv trait composable and the visibility/ACL model consistent across tiers.

Cross-references: OQ-04, ADR-014, ADR-015, ADR-016, ADR-024, operation-registry.md

Theme: alknet-vault

OQ-20: Salt/KDF and Encryption Key Derivation Method

Origin: encryption.md
Status: resolved
Door type: One-way (key derivation method), two-way (salt field usage)
Priority: high
Resolution: The vault uses SLIP-0010 HD derivation from the BIP39 seed at path m/74'/2'/0'/0' to produce the AES-256-GCM encryption key — not PBKDF2. The salt field in EncryptedData is unused for key derivation (kept for wire-format compatibility with the TS predecessor). The TypeScript @alkdev/storage crypto module used PBKDF2 with a password + salt; data encrypted by that method (key_version=1) cannot be decrypted by the vault and must be migrated via one-time re-encryption to key_version=2. See ADR-020 for the full rationale and migration path.
Cross-references: ADR-020, encryption.md

OQ-21: Remote Vault Administration

Origin: service.md, protocol.md, ADR-019
Status: resolved
Door type: One-way (vault crate is local-only by construction)
Priority: medium
Resolution: Remote vault access is not a feature of the vault crate. ADR-025 dropped irpc from the vault, making the vault local-only by construction — no RemoteService trait, no wire format for vault messages, no default-insecure remote handler. The vault's API is VaultServiceHandle (direct method calls), nothing else.

If remote vault access is ever needed (e.g., the machine→worker pattern), it requires a separate vault-server crate that depends on both alknet-core (for IdentityProvider, scopes, auth-wrapping) and alknet-vault (for VaultServiceHandle). That crate would define its own threat model, access policy, operation filtering (Unlock/Lock local-only), and wire format — and requires its own ADR. This is a deliberate addition, not a flag flip on a default that was already loaded.

The pre-ADR-025 deferral framed remote access as "non-breaking" (the wire format was additive). That framing was misleading: once workers build dependencies on the remote vault API, disabling it breaks them — the door is operationally one-way even if the wire format is additive. ADR-025 inverts the default: the vault is local-only by construction, and remote access requires building something new, not removing a default.

Per-node vaults are the recommended pattern for multi-node deployments: each node has its own vault and mnemonic; credentials are encrypted for the receiving node's public key, not decrypted centrally. This is end-to-end encryption between nodes, matching ADR-008's "capability source" model.
Cross-references: ADR-005, ADR-008, ADR-014, ADR-018, ADR-019, ADR-025, protocol.md, service.md

OQ-22: Key Rotation Mechanism

Origin: encryption.md
Status: resolved
Door type: One-way (path scheme), two-way (rotation policy)
Priority: medium
Resolution: Key rotation uses version-indexed derivation paths. Each key version maps to a distinct SLIP-0010 path: m/74'/2'/0'/{version-2}'. v2 (current) is at m/74'/2'/0'/0'; v3 is at m/74'/2'/0'/1'; etc. The decrypt method derives the key at the path indicated by encrypted.key_version (not always at PATHS::ENCRYPTION). The rotate method decrypts with the old version's key and re-encrypts with the new version's key — no new mnemonic needed. The assembly layer or a migration tool iterates stored blobs and calls rotate on each; the vault does not self-rotate. Partial rotation is safe (old keys remain derivable). See ADR-021.
Cross-references: ADR-020, ADR-021, encryption.md, service.md

OQ-23: Handler Identity Registration Path and Composition Authority

Origin: operation-registry.md, call-protocol.md, ADR-015
Status: resolved
Door type: One-way (security model), two-way (bundle shape)
Priority: high
Resolution: ADR-015 said handler identity was "set at registration by the assembly layer" but the registration API (register(spec, handler)) had no place for it — meaning every internal call would check ACL against None, reproducing the escalation gap ADR-015 was written to close. ADR-022 resolves this with a registration bundle (HandlerRegistration) carrying provenance, composition_authority (replacing handler_identity: Identity — it's a declared authority bundle, not a peer identity), scoped_env, and capabilities. The dispatch path (build_root_context and OperationEnv::invoke()) reads from the bundle. Provenance determines which ops can compose: only Local and Session get composition authority; leaves (FromOpenAPI, FromMCP, FromCall) get None — they don't compose, so they don't need it. Capabilities are per-request on OperationContext, populated from the bundle (resolving the closure-capture vs context ambiguity). The kernel/user analogy: user's authority checked once at the External gate; handler's composition authority used for all composition inside; scoped env bounds reachability. No intersection — the user's authority does not limit internal calls. See ADR-022.
Cross-references: ADR-014, ADR-015, ADR-022, docs/reviews/001-pre-implementation-architecture-sanity-check.md (C1–C4), operation-registry.md, call-protocol.md

OQ-24: Operation Error Schemas

Origin: operation-registry.md, call-protocol.md, ADR-017
Status: resolved
Door type: One-way (wire format), two-way (mapping mechanism)
Priority: high
Resolution: OperationSpec gains error_schemas: Vec<ErrorDefinition> where each ErrorDefinition carries a code, description, schema (JSON Schema for the error detail payload), and optional http_status (for adapter projection). The call.error payload gains an optional details field carrying the typed error payload. Protocol-level codes (NOT_FOUND, FORBIDDEN, INVALID_INPUT, INTERNAL, TIMEOUT) are distinct from operation-level domain codes (FILE_NOT_FOUND, RATE_LIMITED, etc.) — protocol codes are emitted by the dispatch machinery, operation codes by handlers. from_openapi/to_openapi map OpenAPI response status codes to/from ErrorDefinitions, making the adapter contract from ADR-017 faithful on the error axis. services/schema exposes error_schemas for client code generation. See ADR-023.
Cross-references: ADR-017, ADR-023, docs/reviews/001-pre-implementation-architecture-sanity-check.md (C5), operation-registry.md, call-protocol.md

Theme: Call Client and Adapters

These open questions are the remainders from the call-completion gap analysis (docs/research/alknet-call-completion/gap-analysis.md, DC-1..4) and the peer-graph routing research (docs/research/alknet-call-peer-routing/findings.md). ADR-029 supersedes ADR-028 and dissolves OQ-25 and the cross-peer half of OQ-28; the remaining two-way-door shape/defaults are recorded in client-and-adapters.md and may be revisited during implementation without a new ADR.

OQ-25: Remote-Safe Marking Shape for CallClient Peer-Scoped Filtering (Dissolved by ADR-029)

Origin: client-and-adapters.md, ADR-017 (§1 Consequences), ADR-028
Status: dissolved (ADR-029)
Door type: ~~Two-way (shape only — existence is one-way, resolved by ADR-028)~~
Priority: ~~medium~~
Resolution: Dissolved by ADR-029. ADR-028's remote_safe: bool / trusted_peer model is superseded — it was a parallel, weaker authorization system that duplicated the existing AccessControl/Identity machinery. ADR-029 retires remote_safe/ trusted_peer entirely; peer authorization flows through AccessControl::check(peer_identity). The op's AccessControl is the peer-authorization policy — there is no separate marking. Per-peer differentiation is via IdentityProvider config (different peers get different scopes), not a per-op boolean. The "shape" question is moot because there is no marking to shape. See ADR-029 §3.
Cross-references: ADR-009, ADR-014, ADR-015, ADR-017, ADR-022, ADR-024, ~~ADR-028~~ (superseded), ADR-029, client-and-adapters.md, operation-registry.md

OQ-26: OperationAdapter Error Type (AdapterError Variants)

Origin: client-and-adapters.md, ADR-017 §5, ADR-029 §5
Status: resolved (2026-06-27)
Door type: Two-way
Priority: medium
Resolution: The AdapterError enum is #[non_exhaustive] + thiserror::Error, with these v1 variants:
- DiscoveryFailed { message: String } — from_call remote unreachable / services/list failed
- SchemaParse { message: String } — from_openapi / from_jsonschema couldn't parse the spec
- Transport { message: String } — underlying transport error (QUIC for from_call, HTTP for from_openapi/from_mcp)
- Unauthorized { message: String } — HTTP 401 for from_openapi/from_mcp, auth rejected for from_call
- SamePeerCollision { message: String } — namespace collision within a single peer (ADR-029 §5: cross-peer collision dissolves; same-peer collision stays an error). Replaces the flat Conflict variant from the pre-ADR-029 implementation.
#[non_exhaustive] lets alknet-http's adapters extend without breaking match arms. The variant payloads are String messages — kept simple and Send + Sync by construction. This matches the shipped implementation (crates/alknet-call/src/client/mod.rs) except Conflict → SamePeerCollision (the ADR-029 migration renames it). Two-way door: adding variants later is non-breaking; renaming a variant is a match-arm update but not an architectural change.
Cross-references: ADR-017, ADR-029, client-and-adapters.md

OQ-27: from_call Re-Import Trigger

Origin: client-and-adapters.md, ADR-017 Assumption 4
Status: open
Door type: Two-way
Priority: low
Resolution: ADR-017 Assumption 4 noted re-import "happens on reconnection or is triggered explicitly." The v1 default is auto-re-import on connection establishment. The overlay is per-connection (Layer 2, ADR-024), so a stale overlay dies with the connection; re-import on reconnect is naturally scoped to the new connection. This is the right default for the runner pattern (a worker reconnects → the hub re-discovers the worker's ops automatically). Explicit re-import via a future CallConnection::refresh() method is additive and can be added if a deployment needs manual control. Reversal is cheap; no ADR needed.
Cross-references: ADR-017, ADR-024, client-and-adapters.md

OQ-28: from_call Namespace Collision Behavior

Origin: client-and-adapters.md, ADR-017 §3
Status: open
Door type: Two-way
Priority: low
Resolution: ADR-017 §3's FromCallConfig namespace prefix is optional, default no prefix, collision = error. A node importing from two remotes that both expose /container/exec without prefixes should fail loudly rather than silently overwrite. The operator adds prefixes when they know they're importing from multiple sources. This matches the default-deny, explicit-allow posture (ADR-015, ADR-028). Reversal is cheap; no ADR needed. The alternative (last-wins) would silently mask one remote's op behind another's, which is the kind of surprise the default-deny posture exists to avoid.

Cross-peer collision dissolved by ADR-029. Under the peer-keyed overlay model, same name on different peers is fine — they live in separate peer sub-overlays, no collision, no prefix needed. The collision rule now stays only within a peer (same name on the same peer is still an error — a peer shouldn't expose two ops with the same name). FromCallConfig::namespace_prefix becomes optional local-naming sugar, not the disambiguation mechanism. See ADR-029 §5.
Cross-references: ADR-015, ADR-017, ~~ADR-028~~ (superseded), ADR-029, client-and-adapters.md

OQ-29: CallClient TLS Client-Auth and Remote-Identity Verification

Origin: client-and-adapters.md, ADR-017 §7
Status: open
Door type: Two-way
Priority: medium
Resolution: v1 CallClient::connect() builds the quinn client config with with_no_client_auth() and an AcceptAnyServerCertVerifier — the client does not present its TLS identity (credentials.tls_identity) as a client cert, and does not pin the remote's expected identity from credentials.remote_identity. The server-side AcceptAnyCertVerifier (in alknet-core's endpoint) does not require or verify client certs, so a client cert is not needed to establish a connection in v1. Wiring the local node's RawKey/X509 identity as a rustls client-auth cert (for servers that do verify client identity) and plugging credentials.remote_identity into a real ServerCertVerifier is additive — a two-way-door remainder surfaced during implementation. The one-way constraint (credentials from Capabilities, not env vars, ADR-014) is unaffected: the auth_token dimension flows through the call-protocol auth_token payload field, not TLS, so the no-env-vars invariant holds independently of this gap. Decided during a future task that wires RawKey client-auth; recorded here, not in a full ADR.
Cross-references: ADR-014, ADR-017, ADR-027, client-and-adapters.md, endpoint.md

OQ-30: PeerRef::Any Routing Policy

Origin: ADR-029 §2, client-and-adapters.md, docs/research/alknet-call-peer-routing/findings.md §3.2
Status: open
Door type: Two-way
Priority: low
Resolution: v1 PeerRef::Any uses insertion-order first-match — deterministic but order-dependent (worker A connects before worker B → Any routes to A until A disconnects). This is the simplest routing policy and is correct for the immediate use case (the head picks the first worker that serves the op). A richer RoutingPolicy (round-robin, least-loaded, affinity) is the two-way-door remainder; the PeerRef enum is designed to compose with a Route { selector, policy } struct without breaking the invoke_peer signature. Decided during implementation when a fan-out use case needs it; recorded here, not in a full ADR.
Cross-references: ADR-029, client-and-adapters.md

OQ-31: services/list-peers Re-Export Semantics

Origin: ADR-029 §6, docs/research/alknet-call-peer-routing/findings.md §3.5
Status: open
Door type: Two-way
Priority: low
Resolution: v1 defaults to "own ops only" — services/list shows the head's own Layer 0 External ops, filtered by AccessControl::check(calling_peer), unchanged from today (minus the remote_safe filter). A services/list-peers opt-in (new built-in operation) lists the peer overlays with attribution: each peer's sub-overlay listed as { peer: Option<PeerId>, operations: [...] }, filtered by the calling peer's authorization. Whether re-exported peer ops are listed by default, opt-in, or per-peer-policy is the two-way-door remainder; v1 is opt-in (services/list-peers). The re-export policy is an AccessControl decision on the listing op. Decided during implementation when a consumer needs peer-attributed discovery; recorded here, not in a full ADR.
Cross-references: ADR-029, client-and-adapters.md

OQ-32: Multi-Hop Federation

Origin: ADR-029 §3.7, docs/research/alknet-call-peer-routing/findings.md §3.7
Status: open
Door type: One-way (federation model), two-way (mechanism)
Priority: low
Resolution: v1 is one-hop — worker A does not transitively see worker B's ops through the head unless the head explicitly re-exports them. The peer-keyed overlay model extends to multi-hop without redesign (a chain of PeerRef::Specific routing decisions), but path-finding (which peer reaches which op transitively) is where a graph library (petgraph) would pay off. For v1 (one hop, shallow), a nested HashMap<PeerId, HashMap<String, ...>> suffices. Whether multi-hop federation becomes a real use case is a future decision; the peer-keyed model does not foreclose it. Not designed; tracked here so the v1 model's extendability is recorded.
Cross-references: ADR-029, client-and-adapters.md

OQ-33: PeerId — Cryptographic Identity vs Stable Logical Identifier

Origin: ADR-029 Assumption 1, docs/research/alknet-call-peer-routing/findings.md §6.1
Status: resolved (2026-06-27)
Door type: One-way (composition semantics), two-way (id source)
Priority: high
Resolution: PeerId is a logical identifier, decoupled from the cryptographic identity. It is not Identity.id (the TLS fingerprint or API-key prefix) — those change on key rotation, which would break every in-flight PeerRef::Specific and every ACL entry referencing that peer.

v1 source: connection-assigned UUID (v4) at connect()/accept() time. Stable for the connection's lifetime; changes on reconnect. This is a no-storage workaround — the project has deliberately avoided a DB backend for the core crates (smaller, fewer deps, simpler testing), which has served the local-only crates (vault, registry) well. But peer identity is the first cross-node state that wants persistence: what we actually want is a persistent mapping from a logical peer identity to its current cryptographic material, updated on key rotation, surviving restarts. Without a DB, the UUID is the least-bad ephemeral option — the failure mode (in-flight PeerRef::Specific gets NOT_FOUND on reconnect) is acceptable for v1, and the re-from_call produces a fresh PeerRef.

The real solution (future, tracked as OQ-34): a persistent peer registry — a mapping from a stable logical peer identity (configured node name or registered identity) to its current cryptographic material, persisted across restarts and key rotations. This is what makes the ACL-stability concern below work correctly: the ACL entry keys on the logical name, the peer registry tracks the current crypto identity for that name, and key rotation becomes a vault-only operation with no ACL update on the remote side. The no-DB posture of the core crates means this registry lives outside the core — likely in a service crate or an assembly-layer store — not in alknet-call itself. See OQ-34.

Key-rotation / ACL note (context for the future, not a v1 decision): if PeerId were the fingerprint, rotating a node's TLS key would change its PeerId, invalidating every ACL entry that references that peer. The vault makes local key rotation easy (derive a new key, re-encrypt, ADR-021); the problem is the remote side's ACL — the hub's authorized_fingerprints / AccessControl entries that reference the old fingerprint. Decoupling PeerId from the crypto material means the ACL entry can persist across key rotation — but only if there's a store that maps the logical name to the new crypto identity after rotation. That store is OQ-34. The v1 decision (logical id, not crypto; UUID source) keeps the door open for it without requiring it now.

The one-way door: PeerId is a logical id, not Identity.id. This determines the PeerCompositeEnv key type, the PeerRef::Specific payload type, and the ScopedPeerEnv.peer_pinned entry shape. Reversing it (switching to Identity.id) would break the peer-keyed overlay, the routing selector, and the reachability set simultaneously. The source of the logical id (UUID now, peer registry later) is the two-way-door remainder — switching from UUID to a persistent registry changes the id-generation path, not the composition model.
Cross-references: ADR-009, ADR-014, ADR-015, ADR-017, ADR-021, ADR-027, ADR-029, OQ-34, client-and-adapters.md, operation-registry.md, auth.md

OQ-34: Persistent Peer Registry (Cross-Node State Storage)

Origin: OQ-33 (the storage dimension it surfaced), the no-DB posture of ADR-008/018/025
Status: open
Door type: One-way (storage boundary), two-way (backend choice)
Priority: medium (not a v1 blocker — UUID works for v1; becomes real when key rotation across nodes or peer-attribution persistence matters)
Resolution: The core crates (alknet-core, alknet-call, alknet-vault) are deliberately storage-free — no DB, no persistence layer, in-memory state only. This has kept the core small and testable, and it works for local-only state (vault key rotation is version-indexed paths, no DB needed, ADR-021). Peer identity is the first cross-node state that wants persistence: a stable logical peer identity mapped to its current cryptographic material, surviving restarts and key rotations. The v1 workaround (OQ-33: connection-assigned UUID) is ephemeral — it works for the immediate use case (head→workers, operator-controlled, reconnects produce a fresh UUID) but doesn't support ACL entries that persist across key rotation, because there's nowhere to store "worker-a's current crypto identity is X."

What this OQ tracks (not designed, not a v1 decision):
- Whether a persistent peer registry belongs in a service crate (e.g., an alknet-registry or alknet-peer-store), in the assembly layer (a SQLite file the binary owns), or as a new alknet-core abstraction (a PeerRegistry trait with no built-in impl, like IdentityProvider).
- Whether the no-DB posture extends to "core has a trait, service has the impl" (the IdentityProvider pattern) or stays "core is storage-free, persistence is entirely outside the crate graph."
- The backend choice (SQLite, a key-value store, a config file) is the two-way-door remainder; the storage boundary (does core know about persistence at all?) is the one-way door.
Why this is a one-way door on the storage boundary, not a two-way door: if core gains a PeerRegistry trait, downstream crates depend on it and the trait shape becomes a contract. If core stays storage-free, the registry lives in a service crate and core never knows about persistence. Reversing either direction breaks downstream consumers. The decision should be made when a concrete use case (key rotation across nodes, durable peer attribution, multi-hop federation with OQ-32) forces it — not before.

Not a v1 blocker. The UUID works for v1; this OQ exists so the no-DB posture's limit is tracked and the decision is made deliberately when it's needed, not accidentally when someone bolts a SQLite file onto the assembly layer and it becomes load-bearing.
Cross-references: ADR-008, ADR-018, ADR-021, ADR-025, ADR-029, OQ-33, auth.md, config.md

48 KiB Raw Blame History Unescape Escape