docs(arch): resolve OQ-42 — proxy-only ownership model for runtime-spawned resources

Lock in the dynamic resource ownership model for runtime-spawned resources (containers, TTYs, runner workspaces). Three decisions: 1. Storage: reuse the repo/adapter pattern (ADR-033, fourth instance alongside IdentityProvider/IdentityStore/CredentialStore) with an in-memory default adapter; sync read on the dispatch hot path. 2. Integration: AccessControl::check consults an ownership provider directly (Option 2); OperationSpec gains resource_id_path (JSON pointer into the input) so the spec is fully self-describing for authorization. 3. Access pattern: proxy-only — spawner owns, proxy to share via from_call + forwarded_for (ADR-032), teardown revokes. No grant mechanism in core; 'poking holes' is a downstream-app concern. A future grant is additive (new trait method), stated as reversal-cost classification, not deferral. Four edge specifics pinned: list = scope-gate + result-filter; teardown = automatic, handler-driven; fleet = per-node ownership, downstream app tracks 'who is this for'; composition = two orthogonal checks, ADR-015/022 unchanged. Removes the prior hedging language ('decision direction set', 'open for the ADR') and the contingent qualifiers from specifics 3/4 now that the proxy-vs-grant call is made. The dependent crate specs (docker, tty, runner, fleet) can declare their AccessControl shapes against this model.
2026-07-04 16:02:38 +00:00
parent 3daecd7ab2
commit 13dd15ab0b
2 changed files with 126 additions and 68 deletions
--- a/docs/architecture/open-questions.md
+++ b/docs/architecture/open-questions.md
@@ -1,6 +1,6 @@
 ---
 status: draft
-last_updated: 2026-07-04
+last_updated: 2026-07-05
 ---

 # Open Questions
@@ -973,9 +973,10 @@ is a feature extension, not an unmade architecture decision.
 - **Origin**: [alknet-docker POC summary](../../research/alknet-docker/poc-summary.md)
  §"Open Unknowns" #3 (container-as-resource identity model); generalized
  during the Phase 1 review pass triggered by that research finding.
- **Status**: open — the two structural decisions below are made; the
-  remaining questions (listed under "Open for the ADR") are what the ADR
-  must settle before the dependent crate specs can be drafted.
+- **Status**: resolved — all five sub-questions are decided (storage
+  shape, integration point, proxy-vs-grant, and the four clarifications).
+  The ADR will write these into decision text; the dependent crate specs
+  can declare their `AccessControl` shapes against this model.
 - **Door type**: One-way (the `AccessControl::check` signature change and
  the `OperationSpec.resource_id_path` addition in core/call), two-way (the
  ownership provider mechanism, per the established repo/adapter pattern)
@@ -989,7 +990,7 @@ is a feature extension, not an unmade architecture decision.
  and downstream crates build on whatever default was picked, making the
  "cheap reversal" expensive.

- **Resolution (decided so far).**
+- **Resolution.**

  **Decision 1 — storage side: reuse the repo/adapter pattern.** The
  ownership store is a fourth instance of the established repo/adapter
@@ -1058,74 +1059,131 @@ is a feature extension, not an unmade architecture decision.
  conventions, no handler-level knowledge, no "the dispatcher just knows."
  The contract is on the spec, where it belongs.

- **Open for the ADR.** The decisions above settle the storage shape and
-  the integration point. The ADR must still address:
+- **Resolved specifics (the four questions the ADR must write into
+  decision text).** The decisions above settle the storage shape and the
+  integration point. The four specifics below settle how the model
+  behaves at the edges:

-  1. **No-specific-resource operations (the `list` case).** Operations
-     with `resource_type` set but `resource_id_path` absent — e.g.
-     `docker/container/list`, which doesn't reference a specific container.
-     There the question is "does this peer have *any* resource of this
-     type?" rather than "does this peer own *this* resource?" Option 2
-     handles this naturally (check asks the provider "any resources of
-     type T for this identity?" when no specific ID is present), but the
-     exact semantics need to be pinned: does the ACL gate the whole call
-     (allow/deny), or does the handler filter the result to owned
-     containers (allow + filter)? The former is the scope-gating path; the
-     latter is the result-filtering path. They compose (scope-gate the
-     call, then filter the result), but the ADR should state which is the
-     default and how a spec declares which it wants. `list` is the case
+  1. **No-specific-resource operations (the `list` case) — scope-gate +
+     result-filter, composing.** Operations with `resource_type` set but
+     `resource_id_path` absent — e.g. `docker/container/list`, which
+     doesn't reference a specific container. When a coordinator lists
+     containers it owns, it should see only its own — not every container
+     on the host. That's not just scope-gating ("can you call
+     `container/list` at all?") and not just result-filtering ("return
+     only owned") — it's both: scope-gate the call (does the peer have the
+     `container:list` scope), then filter the result to owned resources.
+     The default is "allow if scoped, filter to owned." `list` is the case
     that forces this; `exec`/`inspect`/`stop` against a specific
-     container are the clean case.
+     container are the clean case (single targeted ownership lookup via
+     `resource_id_path`). The ADR states the default and how a spec
+     declares which it wants.

-  2. **Teardown coupling.** The ownership store's write path (revoke on
-     teardown) must be coupled to the spawned resource's lifecycle, not
-     left to operator workflows. When a container dies or is removed, the
-     ownership entry must be revoked — otherwise the store accumulates
-     stale entries and an ACL check could reference a resource that no
-     longer exists. The coupling mechanism (the docker handler explicitly
-     calls revoke on container exit, vs. a background reaper, vs. TTL-based
-     expiry) is two-way-door mechanism work, but the ADR should state the
-     coupling requirement and the default mechanism.
+  2. **Teardown coupling — automatic, handler-driven.** The ownership
+     store's write path (revoke on teardown) is coupled to the spawned
+     resource's lifecycle. The "burn it and start over" capability depends
+     on ownership state tracking the lifecycle correctly. When a container
+     dies or is destroyed, the ownership entry is revoked *by the handler
+     that managed the lifecycle* (the docker handler calls revoke on
+     container exit), not by an operator workflow or a background reaper.
+     The burn-and-start-over pattern is: destroy container → ownership
+     revoked automatically → spawn new container → new ownership recorded.
+     If teardown weren't automatic, stale ownership entries would
+     accumulate and the "burn" path would leave dangling ACL state. The
+     architectural commitment is: handler-driven revoke on lifecycle end,
+     not a reaper. The coupling mechanism (explicit handler call vs.
+     lifecycle-hook abstraction) is two-way-door implementation work.

-  3. **Fleet representation (spoke resources on the hub).** When a worker
-     spoke spawns a container and exposes it to the hub over the call
-     protocol, the hub's ownership store needs to represent "peer X owns
-     resource R" for routing/ACL on the hub side. Whether the spoke pushes
-     ownership records to the hub on spawn, the hub derives them from
-     `from_call`-discovered operations, or the spoke owns the ACL decision
-     and the hub forwards — is a real question with cross-node state
-     implications. The POC summary's §6 head-worker/machine-node model
-     frames the topology; this question is where that topology meets the
-     ownership model. Likely the most consequential of the three open
-     questions.
+  3. **Fleet representation (spoke resources on the hub) — per-node
+     ownership, downstream app tracks "who is this for."** Under the
+     proxy pattern (Decision 3 below), the docker node records "coordinator
+     owns C" in its local ownership store. The coordinator's "I started C
+     for agent Y" mapping lives in the coordinator's own downstream-app
+     state, not in the core ownership store. The ownership store is
+     per-node (each docker node records its local ownership); the hub's
+     agent-to-workspace mapping is app state. There is no cross-node
+     ownership propagation in the base model — the spoke sees the hub as
+     the owner, and the hub's "who is this for" is its own concern. The
+     proxy pattern keeps ownership local, which is why this question is
+     less consequential than originally framed.

-  4. **Composition interaction.** ADR-015/022 populate
-     `CompositionAuthority.resources` for internal composition calls. With
-     dynamic ownership, an internal composition that targets a runtime-
-     spawned resource (a handler composing `docker/container/exec` against
-     a specific container) needs the composition authority to be checkable
-     against the ownership store too, not just against the static
-     `CompositionAuthority.resources` map. Whether `CompositionAuthority`
-     grows a dynamic-ownership path parallel to `Identity`, or composition
-     always runs under the caller's ownership, or some third option — needs
-     to be stated so the privilege model stays coherent with the ownership
-     model.
+  4. **Composition interaction — two separate checks, no change to
+     `CompositionAuthority`.** In the proxy pattern, the coordinator
+     composes `docker/container/exec` on behalf of an agent. Two checks
+     must pass: (a) the coordinator's `CompositionAuthority` has the
+     `container:exec` scope (static, ADR-015/022 unchanged), and (b) the
+     coordinator owns this specific container (dynamic, ownership store).
+     The composition authority stays static — it doesn't grow a dynamic
+     path. The ownership store handles the dynamic resource-level check.
+     Both must pass; they're orthogonal. **ADR-015/022 don't need
+     amendment** — the composition authority is unchanged, and the
+     ownership store is an additional check, not a modification to the
+     existing one.

-  These are genuine open questions, not deferred decisions — the ADR must
-  answer them. They were surfaced by choosing Option 2 + `resource_id_path`
-  rather than by leaving the integration point undecided; recording them
-  here so the ADR drafting starts from a known set of specifics to work
-  out, not from a blank page.
+- **Decision 3 — access pattern: proxy-only as the base model.** The base
+  model is "spawner owns, proxy to share, teardown revokes" — with no
+  grant/transfer mechanism in the core ownership store. Two patterns for
+  how a downstream consumer reaches a runtime-spawned resource were
+  identified:
+
+  - **Proxy pattern (the common case, and the only one the core model
+    supports).** A coordinator starts a container and manages its
+    lifecycle; the end user never talks to docker directly. The
+    coordinator re-exports the docker operations it wants to expose (via
+    `from_call` — the adapter that imports a peer's operations and
+    re-registers them locally, ADR-017 — or by composing them in its own
+    handlers), and when the end user invokes one, the coordinator is the
+    *direct caller* to the docker endpoint. Docker's ownership store sees the coordinator as the
+    owner and as the caller — the check passes. The end user's identity
+    rides as `forwarded_for` metadata (ADR-032), and the coordinator does
+    whatever end-user-level ACL it wants at its own layer. This is the
+    kernel/user-land + forwarded-for model: the hub's authority is used,
+    `forwarded_for` is metadata, the hub handles its own ACL.
+
+  - **Grant pattern ("poking holes") — not in the core model.** A
+    downstream app wants to give an end user *direct* call-protocol
+    access to the docker endpoint for specific containers — the end user
+    calls `docker/container/exec` themselves, not through a proxy. Docker's
+    ownership store would need a record that the end user has access to
+    that container, even though the downstream app spawned it. No
+    described use case requires this. The agent-workspace case — the
+    concrete one — is entirely the proxy pattern: the coordinator starts
+    the workspace container; the agent interacts with what's *inside* the
+    container (a TTY, an opencode instance's API surface), not with
+    docker operations on the container. Docker-level operations (stop,
+    remove, inspect) are the coordinator's job.
+
+  "Poking holes" is a downstream-app concern — the app that owns the
+  resources re-exports the operations it wants to share via `from_call`
+  with its own ACL layer, rather than the core ownership store growing a
+  grant API. The ADR commits to proxy-only and explicitly states that
+  "poking holes" is a downstream app's job.
+
+  **A future grant mechanism is additive, not a one-way door closure.**
+  If a use case forces the grant pattern, it's a new method on the
+  ownership store trait (`grant(identity, resource)` /
+  `revoke_grant(...)`). `AccessControl::check` already consults the
+  ownership provider; a grant-aware provider would answer "yes" for
+  grantees in addition to owners, without a trait-shape change. The
+  two-way-door classification (additive) is stated here as reversal-cost
+  classification, not as a reason to defer the decision — the decision is
+  made (proxy-only), and the cost of reversing it if a future use case
+  forces it is low. If the grant pattern is later admitted, specifics 3
+  and 4 above are revisited: cross-node ownership propagation returns to
+  the table (3), and composition under a grant would need
+  `CompositionAuthority` to grow a dynamic path, amending ADR-015/022 (4).

 - **Cross-references**: ADR-009 (door-type-as-deferral anti-pattern),
  ADR-015, ADR-022 (the static `CompositionAuthority.resources` model this
-  extends — see open question 4), ADR-030, ADR-033 (repo/adapter pattern —
-  reused for the ownership store), ADR-035 (`IdentityStore` —
-  administrative peer mutations, a different concern from runtime resource
-  ownership, but the sync-read + ArcSwap + honker-NOTIFY shape is reused),
-  [auth.md](crates/core/auth.md) (`Identity.resources`, `AccessControl::check`
-  interaction — both under edit by this decision),
-  [operation-registry.md](crates/call/operation-registry.md) (`AccessControl`,
-  `OperationSpec` — `resource_id_path` addition),
+  extends — see open question 4), ADR-030, ADR-032 (`forwarded_for`
+  metadata — the proxy pattern's end-user-identity carrier), ADR-033
+  (repo/adapter pattern — reused for the ownership store), ADR-035
+  (`IdentityStore` — administrative peer mutations, a different concern
+  from runtime resource ownership, but the sync-read + ArcSwap +
+  honker-NOTIFY shape is reused),
+  [auth.md](crates/core/auth.md) (`Identity.resources`,
+  `AccessControl::check` interaction — both under edit by this decision),
+  [operation-registry.md](crates/call/operation-registry.md)
+  (`AccessControl`, `OperationSpec` — `resource_id_path` addition),
  [alknet-docker POC summary](../../research/alknet-docker/poc-summary.md)
  §"Open Unknowns" #3