docs(arch): set OQ-42 decision direction — repo/adapter storage + Option 2 integration

Two structural decisions for dynamic resource ownership (OQ-42), recorded in the OQ so ADR drafting starts from a clear position: 1. Storage side reuses the repo/adapter pattern (ADR-033) — a fourth instance alongside IdentityProvider/IdentityStore/CredentialStore. Trait in alknet-core with an in-memory default adapter; persistence adapter separable. Sync read with ArcSwap + honker-NOTIFY cache invalidation, same shape as ConfigIdentityProvider (ADR-035). No new shape invented; no Phase 0 needed for the storage side. 2. Integration point is Option 2 — AccessControl::check consults the ownership provider directly. Rejected Option 1 (augment identity with a per-request snapshot) because its purity was theatrical — the question 'can X exec into container C' was never purely a function of identity, it just looked that way because the resource set was static. Option 2 makes check's signature honest about what ACL checking is in the presence of dynamic resources. Cost is a check signature change (one-way door, every call site updates) — implementation cost, not semantic cost, per the project's decision principle. Refinement that makes Option 2 clean: OperationSpec gains resource_id_path (JSON pointer into the input, e.g. '$.containerId'). Fits naturally with the existing JSON-Schema-backed input_schema — the pointer is within an existing schema on the same spec. OperationSpec becomes fully self-describing for authorization: resource type, action, and which input field drives the resource lookup, all declared on the spec. Four specifics remain open for the ADR: the no-specific-resource (list) case, teardown coupling, fleet representation (spoke resources on the hub), and composition interaction with dynamic ownership. These were surfaced by choosing Option 2 rather than by leaving the integration point undecided.
2026-07-04 13:04:14 +00:00
parent e29672942c
commit f390550a06
2 changed files with 140 additions and 88 deletions
--- a/docs/architecture/README.md
+++ b/docs/architecture/README.md
@@ -156,7 +156,7 @@ See [open-questions.md](open-questions.md) for the full tracker.
 - **OQ-41**: Stream operators library — a handler-level utility library (filter, map, batch, dedupe, window, etc. on `BoxStream<T>`), prior art in `@alkdev/pubsub/operators.ts`; feature extension, not an architectural decision (the architecture decision — stream composition is handler-level, not protocol-level — is made in ADR-049)
 **Open (blocking, requires ADR before the dependent crate specs):**
- **OQ-42**: Dynamic resource ownership for runtime-spawned resources — surfaced by the alknet-docker POC (containers as `AccessControl` resources), generalized to every "spawn a thing at runtime and expose it over the call protocol" crate (docker, tty, opencode-runner wrapper, `alknet-container` fleet layer). The current `Identity.resources` → `AccessControl::check` model is static (config-sourced via `PeerEntry`/`CompositionAuthority`); runtime-spawned resources with derived ownership don't fit. One-way door at the model level (core/call), two-way at the mechanism level. High priority — blocks the docker/tty/runner/fleet crate specs. Likely warrants a Phase 0 research/POC pass before the ADR.
+- **OQ-42**: Dynamic resource ownership for runtime-spawned resources — surfaced by the alknet-docker POC (containers as `AccessControl` resources), generalized to every "spawn a thing at runtime and expose it over the call protocol" crate (docker, tty, opencode-runner wrapper, `alknet-container` fleet layer). The current `Identity.resources` → `AccessControl::check` model is static (config-sourced via `PeerEntry`/`CompositionAuthority`); runtime-spawned resources with derived ownership don't fit. **Decision direction set**: storage side reuses the repo/adapter pattern (ADR-033, fourth instance alongside `IdentityProvider`/`IdentityStore`/`CredentialStore`); integration point is Option 2 — `AccessControl::check` consults an ownership provider directly, with `OperationSpec` gaining a `resource_id_path` JSON pointer so the spec stays fully self-describing for authorization. Four specifics remain open for the ADR: the no-specific-resource (`list`) case, teardown coupling, fleet representation (spoke resources on the hub), and composition interaction with dynamic ownership. High priority — blocks the docker/tty/runner/fleet crate specs.
 **Deferred (not active):**
 - **OQ-09**: WASM target boundaries — design constraint, not deliverable
--- a/docs/architecture/open-questions.md
+++ b/docs/architecture/open-questions.md
@@ -963,107 +963,159 @@ is a feature extension, not an unmade architecture decision.
 - **Origin**: [alknet-docker POC summary](../../research/alknet-docker/poc-summary.md)
  §"Open Unknowns" #3 (container-as-resource identity model); generalized
  during the Phase 1 review pass triggered by that research finding.
- **Status**: open
+- **Status**: open — the two structural decisions below are made; the
- **Door type**: One-way (the `Identity.resources` → `AccessControl::check`
+  remaining questions (listed under "Open for the ADR") are what the ADR
-  model in core/call), two-way (the mechanism for supplementing it)
+  must settle before the dependent crate specs can be drafted.
 - **Door type**: One-way (the `AccessControl::check` signature change and
  the `OperationSpec.resource_id_path` addition in core/call), two-way (the
  ownership provider mechanism, per the established repo/adapter pattern)
 - **Priority**: high — blocks the alknet-docker, alknet-tty, opencode-runner
  wrapper, and `alknet-container` (fleet normalization) crate specs. None of
-  those specs can declare their `AccessControl` shapes until this is resolved,
+  those specs can declare their `AccessControl` shapes until this is
-  because the model available to them determines what ACL declarations are
+  resolved, because the model available to them determines what ACL
-  even expressible. Permitting "docker picks a per-crate default and the
+  declarations are even expressible. Permitting "docker picks a per-crate
-  others follow" is the door-type-as-deferral anti-pattern (ADR-009 §"What
+  default and the others follow" is the door-type-as-deferral anti-pattern
-  this framework is NOT"): each crate bakes in an ACL shape and downstream
+  (ADR-009 §"What this framework is NOT"): each crate bakes in an ACL shape
-  crates build on whatever default was picked, making the "cheap reversal"
+  and downstream crates build on whatever default was picked, making the
-  expensive.
+  "cheap reversal" expensive.
 - **Resolution**: Not yet made. This OQ records the issue and its scope so the
  architecture workflow can see it; the resolution requires an ADR (and,
  given the one-way door, likely a Phase 0 research/POC pass first).
-  **The issue, generalized.** The alknet-docker POC research flagged that
+- **Resolution (decided so far).**
  containers are a natural resource for `AccessControl` (`resource_type:
  "container"`, `resource_action: "exec"`), but containers are created at
  runtime — the resource set is dynamic, and "who can exec into container C"
  is a function of "who created C," not of a static `PeerEntry.resources`
  entry an operator wrote. The POC agent's one-line summary — "the
  `IdentityProvider` model in alknet-core is currently static (`PeerEntry`
  set). Dynamic resource ownership needs a spec" — is accurate, and the
  consequence it draws is the right one.
-  The issue is broader than the docker case that surfaced it. **Every crate
+  **Decision 1 — storage side: reuse the repo/adapter pattern.** The
-  that spawns a thing at runtime and exposes it over the call protocol hits
+  ownership store is a fourth instance of the established repo/adapter
-  the same shape:**
+  pattern (ADR-033), alongside `IdentityProvider` (ADR-004), `IdentityStore`
  (ADR-035), and `CredentialStore` (ADR-031). Concretely: a trait in
  `alknet-core` (read method: "does identity X own resource R with action
  A?" / "what resources of type T does X own?"; write method: "record X
  spawned R", "revoke R on teardown"), with an in-memory default adapter in
  core. The in-memory default carries the docker/runner cases with no
  backend dependency — ownership is runtime state, meaningless across
  restarts (a container ID from a previous process doesn't exist), so the
  default case has no persistence requirement. A persistence adapter (e.g.
  sqlite/honker-backed, for a hub that wants fleet ownership to survive
  restarts) is separable and built when a concrete use case forces it, same
  as `alknet-store-sqlite` for peer/credential persistence. The read stays
  sync (called from `AccessControl::check` on the dispatch hot path, no
  `.await`), with persistence adapters caching in memory and using honker
  NOTIFY for invalidation — same `ArcSwap`-backed full-reload pattern as
  `ConfigIdentityProvider` (ADR-035). No new shape invented on the storage
  side; no Phase 0 needed for it.
-  | Crate (some prospective) | Runtime-spawned resource | Ownership derivation |
+  **Decision 2 — integration point: Option 2, `check` consults the
-  |--------------------------|--------------------------|----------------------|
+  ownership provider directly.** `AccessControl::check` grows a parameter
-  | alknet-docker | container ID | who created the container |
+  for the ownership provider (or reads one carried on `OperationContext`),
-  | alknet-tty | a TTY session (often bound to a container) | who opened the session |
+  and consults it for `resource_type`/`resource_action` checks against
-  | opencode-runner wrapper (a runner crate that starts an opencode instance in a container and wraps its OpenAPI surface via `alknet-http`'s `from_openapi`) | an opencode instance | who started the instance |
+  runtime-spawned resources. The alternative considered and rejected —
-  | `alknet-container` (fleet normalization, per the POC summary's §6 boundary) | a normalized container across multiple hosts | transitively, who created the underlying container |
+  Option 1, augmenting `Identity.resources` with a per-request snapshot
  before calling `check` — preserves `check`'s purity by moving the
  impurity one frame up the stack: the dispatcher would pull owned
  resources into a per-request identity snapshot so `check` *looks*
  unchanged while reading state that was never part of the static
  identity. The purity was always theatrical (the question "can X exec
  into container C" was never purely a function of identity; it just
  looked that way because the resource set was static). Option 2 makes
  `check`'s signature honest about what ACL checking *is* in the presence
  of dynamic resources: a function of (ACL, Identity, current-ownership-
  state). The impurity is real either way; Option 2 puts it in the
  signature where it's visible, rather than hiding it in a per-request
  snapshot pretending to be static identity. Option 3 (handler-level
  ownership check, `AccessControl` gates only scope) was rejected because
  it splits the ACL story — some resources statically checked, some
  handler-checked — which is the kind of inconsistency that creates the
  "figure out how it fits with what is there" cleanup this OQ exists to
  prevent.
-  All share: (a) resources are runtime-spawned, not config-declared; (b)
+  The cost of Option 2 is a `check` signature change — a one-way door,
-  ownership is derived from creation, not from a static ACL entry; (c) the
+  every call site and test updates. Per the project's decision principle
-  resource set churns continuously (instances start and stop constantly),
+  (implementation workload is a non-issue relative to semantic correctness
-  so any model requiring operator config edits per resource is a non-starter;
+  and long-term clarity; "path of least resistance compounded over many
-  (d) the resource's lifecycle is bound to a process the call protocol
+  decisions is strictly dominated"), this is implementation cost, not a
-  itself is managing, so ownership state and spawn/teardown must be coupled,
+  semantic cost, and does not bias the choice.
  not two separate operator workflows.
-  Solving this inside the docker crate spec would re-solve it identically in
+  **Refinement that makes Option 2 work cleanly: `OperationSpec` declares
-  each of those crates and they would diverge. The model has to live in
+  where the resource ID lives in the input.** `OperationSpec` gains a
-  core/call, once.
+  `resource_id_path: Option<String>` — a JSON pointer into the operation
  input, e.g. `"$.containerId"` for `docker/container/exec`. The dispatcher
  extracts the resource ID from the input using the spec-declared path,
  passes it to `check`, and `check` asks the provider "does this identity
  own `<resource_type>/<resource_id>` with action `<resource_action>`?" —
  a single targeted lookup, not a whole-resource-set pull. The fit with
  JSON Schema is load-bearing, not incidental: `OperationSpec.input_schema`
  is already a JSON Schema, so `resource_id_path` is a pointer *within* an
  existing schema on the same spec. The `OperationSpec` becomes fully
  self-describing for authorization — what resource type, what action, and
  *which input field* drives the resource lookup. No per-namespace
  conventions, no handler-level knowledge, no "the dispatcher just knows."
  The contract is on the spec, where it belongs.
-  **What the current model does, and why it doesn't fit.** Today
+- **Open for the ADR.** The decisions above settle the storage shape and
-  `Identity.resources: HashMap<String, Vec<String>>` is populated on two
+  the integration point. The ADR must still address:
  paths, both static: from `PeerEntry.resources` (fingerprint/auth-token,
  ADR-030) or from `CompositionAuthority.resources` (composition, ADR-015 /
  ADR-022). `AccessControl::check`
  (`crates/alknet-call/src/registry/spec.rs:59–108`) does a literal
  `identity.resources.get(resource_type)` lookup and a string-equals match
  on `resource_action`. There is no notion of "the resource set is dynamic"
  in that function — it reads a map built when the identity was resolved.
  `ConfigIdentityProvider` reads from `ArcSwap<DynamicConfig>` (hot-reloadable
  config), and `IdentityStore` (ADR-035) adds async `put_peer` /
  `update_peer` / `remove_peer` — but those are *administrative* mutations
  (operator-grade peer-record management), not "peer X just spawned resource
  Y at runtime; record that Y is owned by X." The resource ownership path is
  a different concern from peer-record management, even if they end up
  sharing storage.
-  **The one-way door.** Whether `Identity.resources` stays static-config-
+  1. **No-specific-resource operations (the `list` case).** Operations
-  sourced or gains a dynamic-ownership supplement (and what the trait shape
+     with `resource_type` set but `resource_id_path` absent — e.g.
-  for that supplement is) determines what `AccessControl` declarations the
+     `docker/container/list`, which doesn't reference a specific container.
-  docker/tty/runner/fleet specs can make, what every `AccessControl::check`
+     There the question is "does this peer have *any* resource of this
-  call site reads, and what every consumer of `Identity` assumes. This is
+     type?" rather than "does this peer own *this* resource?" Option 2
-  core, not per-crate. The *mechanism* (a new `ResourceOwnershipProvider`
+     handles this naturally (check asks the provider "any resources of
-  trait vs. extending `IdentityStore` vs. labels/ownership-tags on the
+     type T for this identity?" when no specific ID is present), but the
-  spawned resources themselves vs. a capability-style model) is two-way —
+     exact semantics need to be pinned: does the ACL gate the whole call
-  additive, reversible per the ADR-009 definition — but the model-level
+     (allow/deny), or does the handler filter the result to owned
-  decision is one-way.
+     containers (allow + filter)? The former is the scope-gating path; the
     latter is the result-filtering path. They compose (scope-gate the
     call, then filter the result), but the ADR should state which is the
     default and how a spec declares which it wants. `list` is the case
     that forces this; `exec`/`inspect`/`stop` against a specific
     container are the clean case.
-  **Known consumers of the resolution.** Any spec that declares an
+  2. **Teardown coupling.** The ownership store's write path (revoke on
-  `AccessControl` with `resource_type`/`resource_action` against a
+     teardown) must be coupled to the spawned resource's lifecycle, not
-  runtime-spawned resource set needs this resolved first. Today that's the
+     left to operator workflows. When a container dies or is removed, the
-  docker, tty, opencode-runner, and `alknet-container` specs; future
+     ownership entry must be revoked — otherwise the store accumulates
-  runner-shaped crates (any GPU-job runner, any "spawn a process and expose
+     stale entries and an ACL check could reference a resource that no
-  it over the call protocol" crate) inherit the same requirement. The
+     longer exists. The coupling mechanism (the docker handler explicitly
-  resolution should make the model available in core/call so those specs
+     calls revoke on container exit, vs. a background reaper, vs. TTL-based
-  declare ACLs against it directly, rather than each crate inventing a
+     expiry) is two-way-door mechanism work, but the ADR should state the
-  per-crate ownership layer.
+     coupling requirement and the default mechanism.
  3. **Fleet representation (spoke resources on the hub).** When a worker
     spoke spawns a container and exposes it to the hub over the call
     protocol, the hub's ownership store needs to represent "peer X owns
     resource R" for routing/ACL on the hub side. Whether the spoke pushes
     ownership records to the hub on spawn, the hub derives them from
     `from_call`-discovered operations, or the spoke owns the ACL decision
     and the hub forwards — is a real question with cross-node state
     implications. The POC summary's §6 head-worker/machine-node model
     frames the topology; this question is where that topology meets the
     ownership model. Likely the most consequential of the three open
     questions.
  4. **Composition interaction.** ADR-015/022 populate
     `CompositionAuthority.resources` for internal composition calls. With
     dynamic ownership, an internal composition that targets a runtime-
     spawned resource (a handler composing `docker/container/exec` against
     a specific container) needs the composition authority to be checkable
     against the ownership store too, not just against the static
     `CompositionAuthority.resources` map. Whether `CompositionAuthority`
     grows a dynamic-ownership path parallel to `Identity`, or composition
     always runs under the caller's ownership, or some third option — needs
     to be stated so the privilege model stays coherent with the ownership
     model.
  These are genuine open questions, not deferred decisions — the ADR must
  answer them. They were surfaced by choosing Option 2 + `resource_id_path`
  rather than by leaving the integration point undecided; recording them
  here so the ADR drafting starts from a known set of specifics to work
  out, not from a blank page.
  **Phase 0 may be warranted.** Given the one-way door at the model level
  and the number of plausible mechanisms (each with different tradeoffs
  around consistency, teardown coupling, fleet-scale state, and how a
  remote spoke's spawned resources are represented on the hub), a targeted
  research/POC pass before the ADR is likely the right sequencing — but
  that's a decision to make once the candidate mechanisms are enumerated,
  not something this OQ pre-commits.
 - **Cross-references**: ADR-009 (door-type-as-deferral anti-pattern),
  ADR-015, ADR-022 (the static `CompositionAuthority.resources` model this
-  questions), ADR-030, ADR-035 (`IdentityStore` — administrative peer
+  extends — see open question 4), ADR-030, ADR-033 (repo/adapter pattern —
-  mutations, a different concern from runtime resource ownership),
+  reused for the ownership store), ADR-035 (`IdentityStore` —
  administrative peer mutations, a different concern from runtime resource
  ownership, but the sync-read + ArcSwap + honker-NOTIFY shape is reused),
  [auth.md](crates/core/auth.md) (`Identity.resources`, `AccessControl::check`
-  interaction),
+  interaction — both under edit by this decision),
  [operation-registry.md](crates/call/operation-registry.md) (`AccessControl`,
-  `OperationSpec`),
+  `OperationSpec` — `resource_id_path` addition),
  [alknet-docker POC summary](../../research/alknet-docker/poc-summary.md)
  §"Open Unknowns" #3