docs(architecture): resolve OQ-11 and OQ-19 — all open questions resolved

OQ-11 (handler-level auth observability): Option B — handlers store resolved identity on Connection via set_identity. Two identity scopes: connection-level (observability, write-once-read-many) and per-request (ACL, on OperationContext). Per-request takes precedence for ACL; connection-level is for logging/audit only. OQ-19 (session-scoped registries): Protocol doesn't need changes. OperationEnv must remain a trait (not concrete) to enable session-overlay pattern. Three-tier registry: core (static, External+Internal), session (dynamic, Internal-only), promotion (curated review). Documented as implementation guard in operation-registry.md. All 19 open questions are now resolved. No open one-way or two-way doors remain. The architecture is ready for review and implementation.
2026-06-19 06:05:04 +00:00
parent 8f19eb8861
commit c0a322ac29
7 changed files with 53 additions and 25 deletions
--- a/docs/architecture/README.md
+++ b/docs/architecture/README.md
@@ -9,7 +9,7 @@ last_updated: 2026-06-21

 **Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable) and research/reference material. Foundational ADRs (001–017) are in place, including the BiStream type definition (ADR-007), vault integration (ADR-008), ALPN router/endpoint (ADR-010), AuthContext structure (ADR-011), call protocol stream model (ADR-012), Rust as canonical implementation language (ADR-013), secret material flow with capability injection (ADR-014), privilege model with authority context (ADR-015), abort cascade for nested calls (ADR-016), and call protocol client and adapter contract (ADR-017). The alknet-core and alknet-call crate specs are in draft.

-**Next step**: Review alknet-call spec documents, then begin implementation. OQ-11 (handler-level auth resolution observability) will be resolved during implementation.
+**Next step**: Review alknet-call spec documents, then begin implementation. All open questions are resolved.

 ## Architecture Documents

@@ -67,17 +67,18 @@ See [open-questions.md](open-questions.md) for the full tracker.
 **Resolved two-way doors:**
 - **OQ-04**: Dynamic handler registration — static at startup (ADR-010)
 - **OQ-07**: Call protocol scope — bidirectional streams, EventEnvelope, ID-based correlation (ADR-012)
+- **OQ-11**: Handler-level auth resolution observability — handlers store resolved identity on Connection (Option B); two identity scopes: connection-level (observability) and per-request (ACL)
 - **OQ-12**: TLS identity provisioning — two use cases: RFC 7250 raw keys (default, P2P) and X.509 certs (domain-hosted, browsers). ACME is a proven pattern.
 - **OQ-13**: Operation path format — `/{service}/{op}` is the correct design for alknet-call, not a simplification
 - **OQ-14**: Batch operation semantics — multiple correlated `call.requested` events is the correct protocol design, not a simplification
-
-**Open two-way doors (resolved during implementation):**
- **OQ-11**: Handler-level auth resolution observability — decide during implementation
+- **OQ-19**: Session-scoped registries — agent-written operations via `OperationEnv` trait layering; protocol doesn't need changes; `OperationEnv` must remain a trait

 **Deferred (not active):**
 - **OQ-09**: WASM target boundaries — design constraint, not deliverable
 - **OQ-10**: Git adapter scope — start with smart protocol, add ERC721 later

+**All open questions are resolved.** No open one-way or two-way doors remain. The architecture is ready for review.
+
 ## Document Lifecycle

 | Status | Meaning | Transitions |
--- a/docs/architecture/crates/call/README.md
+++ b/docs/architecture/crates/call/README.md
@@ -41,7 +41,7 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions,
 | OQ-13 | Operation path format and routing scope | resolved | `/{service}/{op}` is the correct design; remote dispatch is a separate layer |
 | OQ-14 | Batch operation semantics | resolved | Correlated `call.requested` events is the correct protocol design |
 | OQ-16 | Safe vault operations for call protocol exposure | resolved (ADR-014) | None exposed for now |
-| OQ-19 | Session-scoped operation registries | open | Agent-written operations overlaid on global registry via `OperationEnv` trait layering. Protocol doesn't need changes; one-way door is not closing the trait-based composition point |
+| OQ-19 | Session-scoped operation registries | resolved | Agent-written operations overlaid on global registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait |

 ## Key Design Principles

--- a/docs/architecture/crates/call/call-protocol.md
+++ b/docs/architecture/crates/call/call-protocol.md
@@ -312,7 +312,7 @@ See [open-questions.md](../../open-questions.md) for full details.
 - **OQ-13** (resolved): Operation path format is `/{service}/{op}`. Remote dispatch is a separate mechanism, not a path prefix.
 - **OQ-14** (resolved): Batch is a client-side pattern of correlated `call.requested` events, not a protocol primitive.
 - **OQ-16** (resolved by ADR-014): No vault operations are exposed over the call protocol for now.
- **OQ-19** (open): Session-scoped operation registries — agent-written operations overlaid on global registry via `OperationEnv` trait layering. Protocol doesn't need changes.
+- **OQ-19** (resolved): Session-scoped operation registries — agent-written operations overlaid on global registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait.

 ## References

--- a/docs/architecture/crates/call/operation-registry.md
+++ b/docs/architecture/crates/call/operation-registry.md
@@ -188,6 +188,8 @@ impl OperationEnv for LocalOperationEnv {

 Future work may add irpc service dispatch and remote call protocol dispatch as additional backends. The handler-facing API stays the same.

+**`OperationEnv` must remain a trait.** This is a constraint, not a suggestion. The trait-based design enables session-scoped registries (OQ-19) — a session env wraps the global env (check session registry first, fall through to global). Making `OperationEnv` concrete or hardcoding the global registry into the dispatch path would close the session-overlay pattern. See OQ-19.
+
 ### Service Discovery

 Two built-in operations expose what the node offers:
@@ -324,7 +326,7 @@ See [open-questions.md](../../open-questions.md) for full details.
 - **OQ-13** (resolved): Operation path format is `/{service}/{op}`. Remote dispatch is a separate mechanism, not a path prefix.
 - **OQ-14** (resolved): Batch is a client-side pattern of correlated `call.requested` events, not a protocol primitive.
 - **OQ-16** (resolved by ADR-014): No vault operations are exposed over the call protocol for now.
- **OQ-19** (open): Session-scoped operation registries — agent-written operations overlaid on the global registry via `OperationEnv` trait layering. Protocol doesn't need changes; one-way door is not closing the trait-based composition point.
+- **OQ-19** (resolved): Session-scoped operation registries — agent-written operations overlaid on the global registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait.

 ## References

--- a/docs/architecture/crates/core/README.md
+++ b/docs/architecture/crates/core/README.md
@@ -1,6 +1,6 @@
 ---
 status: draft
-last_updated: 2026-06-16
+last_updated: 2026-06-21
 ---

 # alknet-core
@@ -36,7 +36,7 @@ Core library for ALPN-based protocol dispatch. Every handler crate depends on al
 |----|-------|--------|-----------|
 | OQ-04 | Dynamic handler registration | resolved (start static) | HandlerRegistry is immutable at startup |
 | OQ-05 | Multi-connectivity endpoint | resolved (quinn + iroh) | AlknetEndpoint supports both, both feature-gated |
-| OQ-11 | AuthContext resolution completeness | open | How handlers signal auth completion |
+| OQ-11 | Handler-level auth resolution observability | resolved | Handlers store resolved identity on Connection; two identity scopes (connection-level for observability, per-request for ACL) |

 ## Key Design Principles

--- a/docs/architecture/crates/core/auth.md
+++ b/docs/architecture/crates/core/auth.md
@@ -1,6 +1,6 @@
 ---
 status: draft
-last_updated: 2026-06-16
+last_updated: 2026-06-21
 ---

 # Authentication
@@ -44,7 +44,7 @@ The endpoint constructs `AuthContext` from the QUIC connection:

 ### Handler-level resolution

-Handlers that require authentication extract protocol-specific credentials and call `IdentityProvider` inside `handle()`:
+Handlers that require authentication extract protocol-specific credentials and call `IdentityProvider` inside `handle()`. When identity is resolved, the handler stores it on the `Connection` for observability:

 ```rust
 // Example: CallAdapter extracting an AuthToken from the first frame
@@ -59,11 +59,25 @@ async fn handle(&self, connection: Connection, auth: &AuthContext) -> Result<(),
                .ok_or(HandlerError::AuthRequired)?
        }
    };
+    connection.set_identity(identity);  // Store for observability (OQ-11)
    // ... proceed with authenticated identity
 }
 ```

-Handlers that don't require authentication (e.g., DNS resolver, health check) can ignore `auth.identity` entirely.
+Handlers that don't require authentication (e.g., DNS resolver, health check) can ignore `auth.identity` entirely and don't call `set_identity`.
+
+### Two Identity Scopes
+
+There are two distinct identity scopes that must not be conflated:
+
+| Scope | Where it's set | Where it's stored | What it represents | Used for |
+|-------|---------------|-------------------|-------------------|----------|
+| Connection-level | Handler in `handle()` | `Connection` (via `set_identity`) | Who opened this QUIC connection | Observability, logging, audit |
+| Per-request | `CallAdapter` per `call.requested` | `OperationContext.identity` | Who is making this specific call | ACL (ADR-015) |
+
+The connection-level identity is stable — set once when the handler resolves it. The per-request identity is dynamic — resolved per `call.requested`, potentially different across requests on the same connection (if different auth tokens are used). The per-request identity takes precedence for ACL on `OperationContext`; the connection-level identity is for observability only, not for ACL.
+
+`Connection` exposes `set_identity` via interior mutability — the handler sets it once when resolved, the endpoint and observability layers read it. The identity is write-once-read-many.

 ### AuthContext is Clone and immutable

@@ -231,7 +245,8 @@ The endpoint's `AlknetEndpoint` also holds `Arc<dyn IdentityProvider>` for endpo
 | AuthContext with optional Identity | [ADR-011](../../decisions/011-authcontext-structure.md) | Explicit None, not "partially authenticated" |
 | AuthContext is immutable in handle() | [ADR-011](../../decisions/011-authcontext-structure.md) | Handlers create local variables for resolved identity |
 | Two resolution paths | [ADR-004](../../decisions/004-auth-as-shared-core.md) | Fingerprint and token, not phased auth |
+| Handler stores resolved identity on Connection | OQ-11 (resolved) | `connection.set_identity()` — write-once-read-many for observability |

 ## Open Questions

- **OQ-11**: See [open-questions.md](../../open-questions.md) — handler-level auth resolution observability.
+None. All auth-related open questions are resolved.
--- a/docs/architecture/open-questions.md
+++ b/docs/architecture/open-questions.md
@@ -120,11 +120,23 @@ These questions are acknowledged but not active. They will be promoted to open w
 ### OQ-11: Handler-Level Auth Resolution Observability

 - **Origin**: [auth.md](crates/core/auth.md)
- **Status**: open
+- **Status**: resolved
 - **Door type**: Two-way
 - **Priority**: medium
- **Resolution**: When a handler resolves identity inside `handle()`, should the resolved `Identity` be stored somewhere for observability (e.g., connection logging), or is the handler's local variable sufficient? Options: (A) handlers return the resolved identity from `handle()`, (B) handlers call a method on Connection to set identity, (C) handlers log locally and the resolved identity stays local. Two-way door — can be decided during implementation.
- **Cross-references**: ADR-004, ADR-011
+- **Resolution**: **Option B — handlers store resolved identity on the Connection.** When a handler resolves identity inside `handle()` (the handler-level auth phase), it calls `connection.set_identity(identity)` to store the resolved `Identity` on the connection object. The endpoint and observability layers can read it later for connection logging, audit trails, and metrics.
+
+  Why not Option A (return identity from `handle()`): it changes the `ProtocolHandler` trait signature for all handlers, even those that don't do auth resolution (DNS, health check). It also assumes one identity per connection — but the call protocol can have different identities per request on the same connection (one connection, multiple `call.requested` events with different auth tokens). Returning a single identity from `handle()` would be misleading for the call protocol.
+
+  Why not Option C (identity stays local): the resolved identity is useful beyond the handler. The endpoint may want to log "connection from X authenticated as Y." A connection-level observability layer needs the identity. If it stays local, every handler that resolves identity would need to duplicate logging logic, and the endpoint can't correlate connections to identities.
+
+  **Two identity scopes exist and must not be conflated:**
+  - **Connection-level identity** (this decision): set once by the handler in `handle()`, stored on `Connection`, read by the endpoint for logging/observability. This is the "connection owner" — who opened this QUIC connection.
+  - **Per-request identity** (already in the call protocol spec): set per `call.requested` by the `CallAdapter`, stored on `OperationContext.identity`. This is the "call caller" — who is making this specific call, which may upgrade mid-session (different auth tokens on the same connection).
+
+  Both exist. The connection-level identity is the stable "who is this connection from"; the per-request identity is the dynamic "who is this specific call from." The call protocol's per-request resolution (which may produce a different identity than the connection-level resolution) takes precedence for ACL on `OperationContext` — the connection-level identity is for observability only, not for ACL.
+
+  `Connection` exposes `set_identity` via interior mutability (`OnceLock<Identity>` or `RwLock<Option<Identity>>` — the handler sets it once when resolved, the endpoint and observability layers read it). `handle()` receives `Connection` by value (owned), but the endpoint may also hold a reference for logging. The identity is write-once-read-many.
+- **Cross-references**: ADR-004, ADR-011, ADR-015 (per-request identity on OperationContext), [auth.md](crates/core/auth.md)

 ### OQ-12: TLS Identity Provisioning in AlknetEndpoint

@@ -204,10 +216,10 @@ These questions are acknowledged but not active. They will be promoted to open w
 ### OQ-19: Session-Scoped Operation Registries and Agent-Written Operations

 - **Origin**: [operation-registry.md](crates/call/operation-registry.md)
- **Status**: open
+- **Status**: resolved
 - **Door type**: Two-way (protocol doesn't need changes), one-way (if implementation closes the door)
 - **Priority**: medium
- **Resolution**: The agent service pattern includes a self-improving workflow where agents write their own operations (tools, scripts) within a session. A POC at `/workspace/toolEnv` demonstrated the mechanism: a quickjs WASM sandbox inside Deno web workers, with a `Proxy`-based env that intercepts property access and bridges to the operation registry via `postMessage`. The sandbox runs with locked-down permissions (no net, no fs, no env). The POC exposed the full registry to the sandbox — a security gap that the scoped composition env (OQ-18) addresses.
+- **Resolution**: The call protocol supports session-scoped registries through `OperationEnv` trait layering. No protocol changes needed. The pattern is documented here and in [operation-registry.md](crates/call/operation-registry.md) to prevent an implementation from accidentally closing it.

  The registry model has three tiers:

@@ -217,13 +229,11 @@ These questions are acknowledged but not active. They will be promoted to open w
  | Session | One session | Session lifetime, dynamic | Internal only (never wire-facing) | Agent during session (sandbox) |
  | Promotion | Session → Core | One-time transition | Manual/curated review | Human or architect agent reviews, then redeploys |

-  Session-scoped operations are always `Internal` (never wire-facing, never in `services/list`), run under the handler's identity (the agent handler that authorized the sandbox), can only compose operations in the handler's scoped env, and are ephemeral (gone when the session ends). Core operations are curated — reviewed by a human or architect agent before promotion. The promotion path is the curation checkpoint where autonomous (session-scoped) becomes curated (core). This is not auto-promotion.
+  Session-scoped operations are always `Internal` (ADR-015), run under the handler's identity (the agent handler that authorized the sandbox), can only compose operations in the handler's scoped env, and are ephemeral (gone when the session ends). Core operations are curated — reviewed before promotion. The promotion path is the curation checkpoint where autonomous (session-scoped) becomes curated (core). This is not auto-promotion.

-  The call protocol does not need changes to support this. The `OperationEnv` trait is the composition point — a session-scoped env wraps the global env (check session registry first, fall through to global). The protocol constraints all apply regardless of which registry an operation lives in: abort cascade (OQ-17), privilege model (OQ-18), visibility (OQ-18), capabilities (ADR-014). The static registration constraint (OQ-04) applies to the global registry only; session registries are dynamic by nature and are a different registry overlaying the global one.
+  **Implementation guard**: `OperationEnv` must remain a trait, not a concrete type. A session-scoped env wraps the global env (check session registry first, fall through to global). Making `OperationEnv` concrete or hardcoding the global registry into the dispatch path would close this pattern. The static registration constraint (OQ-04) applies to the global registry only; session registries are dynamic by nature and are a different registry overlaying the global one.

-  The one-way door this OQ guards against: an implementation that makes `OperationEnv` concrete instead of a trait, or hardcodes the global registry into the dispatch path, would close the session-overlay pattern. The trait-based design already accommodates layering — this OQ documents the pattern so a future implementation doesn't accidentally close it.
+  Session-scoped operations run in a locked-down sandbox (no direct net/fs/env access), can only reach operations in the handler's scoped env, and their output should be validated against their declared schema before returning. The promotion path requires review — an agent with a `promote` scope (the architect role) performs the promotion; the writing agent (lower-privileged role) requests it. This is the role-based escalation pattern (ADR-015): privileges escalate through a chain of command, not through direct authority.

-  The security boundary: session-scoped operations run in a locked-down sandbox (no direct net/fs/env access), can only reach operations in the handler's scoped env, and their output should be validated against their declared schema before returning. The promotion path requires review — an agent with a `promote` scope (the architect role) performs the promotion; the writing agent (lower-privileged role) requests it. This is the role-based escalation pattern: privileges escalate through a chain of command, not through direct authority.
-
-  This is a protocol-level concern in the sense that the protocol must not prevent it, but the agent-specific mechanism (quickjs sandbox, session registry lifecycle, promotion workflow) belongs to the agent crate spec. The call protocol's job is to keep the `OperationEnv` trait composable and the visibility/ACL model consistent across tiers.
- **Cross-references**: OQ-04, OQ-17, OQ-18, ADR-014, [operation-registry.md](crates/call/operation-registry.md)
+  The agent-specific mechanism (quickjs sandbox, session registry lifecycle, promotion workflow) belongs to the agent crate spec. The call protocol's job is to keep the `OperationEnv` trait composable and the visibility/ACL model consistent across tiers.
+- **Cross-references**: OQ-04, ADR-014, ADR-015, ADR-016, [operation-registry.md](crates/call/operation-registry.md)