docs(arch): set OQ-42 decision direction — repo/adapter storage + Option 2 integration

Two structural decisions for dynamic resource ownership (OQ-42), recorded
in the OQ so ADR drafting starts from a clear position:

1. Storage side reuses the repo/adapter pattern (ADR-033) — a fourth
   instance alongside IdentityProvider/IdentityStore/CredentialStore.
   Trait in alknet-core with an in-memory default adapter; persistence
   adapter separable. Sync read with ArcSwap + honker-NOTIFY cache
   invalidation, same shape as ConfigIdentityProvider (ADR-035). No new
   shape invented; no Phase 0 needed for the storage side.

2. Integration point is Option 2 — AccessControl::check consults the
   ownership provider directly. Rejected Option 1 (augment identity with
   a per-request snapshot) because its purity was theatrical — the
   question 'can X exec into container C' was never purely a function of
   identity, it just looked that way because the resource set was static.
   Option 2 makes check's signature honest about what ACL checking is in
   the presence of dynamic resources. Cost is a check signature change
   (one-way door, every call site updates) — implementation cost, not
   semantic cost, per the project's decision principle.

Refinement that makes Option 2 clean: OperationSpec gains resource_id_path
(JSON pointer into the input, e.g. '$.containerId'). Fits naturally with
the existing JSON-Schema-backed input_schema — the pointer is within an
existing schema on the same spec. OperationSpec becomes fully
self-describing for authorization: resource type, action, and which input
field drives the resource lookup, all declared on the spec.

Four specifics remain open for the ADR: the no-specific-resource (list)
case, teardown coupling, fleet representation (spoke resources on the
hub), and composition interaction with dynamic ownership. These were
surfaced by choosing Option 2 rather than by leaving the integration point
undecided.
This commit is contained in:
2026-07-04 13:04:14 +00:00
parent e29672942c
commit f390550a06
2 changed files with 140 additions and 88 deletions

View File

@@ -156,7 +156,7 @@ See [open-questions.md](open-questions.md) for the full tracker.
- **OQ-41**: Stream operators library — a handler-level utility library (filter, map, batch, dedupe, window, etc. on `BoxStream<T>`), prior art in `@alkdev/pubsub/operators.ts`; feature extension, not an architectural decision (the architecture decision — stream composition is handler-level, not protocol-level — is made in ADR-049) - **OQ-41**: Stream operators library — a handler-level utility library (filter, map, batch, dedupe, window, etc. on `BoxStream<T>`), prior art in `@alkdev/pubsub/operators.ts`; feature extension, not an architectural decision (the architecture decision — stream composition is handler-level, not protocol-level — is made in ADR-049)
**Open (blocking, requires ADR before the dependent crate specs):** **Open (blocking, requires ADR before the dependent crate specs):**
- **OQ-42**: Dynamic resource ownership for runtime-spawned resources — surfaced by the alknet-docker POC (containers as `AccessControl` resources), generalized to every "spawn a thing at runtime and expose it over the call protocol" crate (docker, tty, opencode-runner wrapper, `alknet-container` fleet layer). The current `Identity.resources``AccessControl::check` model is static (config-sourced via `PeerEntry`/`CompositionAuthority`); runtime-spawned resources with derived ownership don't fit. One-way door at the model level (core/call), two-way at the mechanism level. High priority — blocks the docker/tty/runner/fleet crate specs. Likely warrants a Phase 0 research/POC pass before the ADR. - **OQ-42**: Dynamic resource ownership for runtime-spawned resources — surfaced by the alknet-docker POC (containers as `AccessControl` resources), generalized to every "spawn a thing at runtime and expose it over the call protocol" crate (docker, tty, opencode-runner wrapper, `alknet-container` fleet layer). The current `Identity.resources``AccessControl::check` model is static (config-sourced via `PeerEntry`/`CompositionAuthority`); runtime-spawned resources with derived ownership don't fit. **Decision direction set**: storage side reuses the repo/adapter pattern (ADR-033, fourth instance alongside `IdentityProvider`/`IdentityStore`/`CredentialStore`); integration point is Option 2 — `AccessControl::check` consults an ownership provider directly, with `OperationSpec` gaining a `resource_id_path` JSON pointer so the spec stays fully self-describing for authorization. Four specifics remain open for the ADR: the no-specific-resource (`list`) case, teardown coupling, fleet representation (spoke resources on the hub), and composition interaction with dynamic ownership. High priority — blocks the docker/tty/runner/fleet crate specs.
**Deferred (not active):** **Deferred (not active):**
- **OQ-09**: WASM target boundaries — design constraint, not deliverable - **OQ-09**: WASM target boundaries — design constraint, not deliverable

View File

@@ -963,107 +963,159 @@ is a feature extension, not an unmade architecture decision.
- **Origin**: [alknet-docker POC summary](../../research/alknet-docker/poc-summary.md) - **Origin**: [alknet-docker POC summary](../../research/alknet-docker/poc-summary.md)
§"Open Unknowns" #3 (container-as-resource identity model); generalized §"Open Unknowns" #3 (container-as-resource identity model); generalized
during the Phase 1 review pass triggered by that research finding. during the Phase 1 review pass triggered by that research finding.
- **Status**: open - **Status**: open — the two structural decisions below are made; the
- **Door type**: One-way (the `Identity.resources``AccessControl::check` remaining questions (listed under "Open for the ADR") are what the ADR
model in core/call), two-way (the mechanism for supplementing it) must settle before the dependent crate specs can be drafted.
- **Door type**: One-way (the `AccessControl::check` signature change and
the `OperationSpec.resource_id_path` addition in core/call), two-way (the
ownership provider mechanism, per the established repo/adapter pattern)
- **Priority**: high — blocks the alknet-docker, alknet-tty, opencode-runner - **Priority**: high — blocks the alknet-docker, alknet-tty, opencode-runner
wrapper, and `alknet-container` (fleet normalization) crate specs. None of wrapper, and `alknet-container` (fleet normalization) crate specs. None of
those specs can declare their `AccessControl` shapes until this is resolved, those specs can declare their `AccessControl` shapes until this is
because the model available to them determines what ACL declarations are resolved, because the model available to them determines what ACL
even expressible. Permitting "docker picks a per-crate default and the declarations are even expressible. Permitting "docker picks a per-crate
others follow" is the door-type-as-deferral anti-pattern (ADR-009 §"What default and the others follow" is the door-type-as-deferral anti-pattern
this framework is NOT"): each crate bakes in an ACL shape and downstream (ADR-009 §"What this framework is NOT"): each crate bakes in an ACL shape
crates build on whatever default was picked, making the "cheap reversal" and downstream crates build on whatever default was picked, making the
expensive. "cheap reversal" expensive.
- **Resolution**: Not yet made. This OQ records the issue and its scope so the
architecture workflow can see it; the resolution requires an ADR (and,
given the one-way door, likely a Phase 0 research/POC pass first).
**The issue, generalized.** The alknet-docker POC research flagged that - **Resolution (decided so far).**
containers are a natural resource for `AccessControl` (`resource_type:
"container"`, `resource_action: "exec"`), but containers are created at
runtime — the resource set is dynamic, and "who can exec into container C"
is a function of "who created C," not of a static `PeerEntry.resources`
entry an operator wrote. The POC agent's one-line summary — "the
`IdentityProvider` model in alknet-core is currently static (`PeerEntry`
set). Dynamic resource ownership needs a spec" — is accurate, and the
consequence it draws is the right one.
The issue is broader than the docker case that surfaced it. **Every crate **Decision 1 — storage side: reuse the repo/adapter pattern.** The
that spawns a thing at runtime and exposes it over the call protocol hits ownership store is a fourth instance of the established repo/adapter
the same shape:** pattern (ADR-033), alongside `IdentityProvider` (ADR-004), `IdentityStore`
(ADR-035), and `CredentialStore` (ADR-031). Concretely: a trait in
`alknet-core` (read method: "does identity X own resource R with action
A?" / "what resources of type T does X own?"; write method: "record X
spawned R", "revoke R on teardown"), with an in-memory default adapter in
core. The in-memory default carries the docker/runner cases with no
backend dependency — ownership is runtime state, meaningless across
restarts (a container ID from a previous process doesn't exist), so the
default case has no persistence requirement. A persistence adapter (e.g.
sqlite/honker-backed, for a hub that wants fleet ownership to survive
restarts) is separable and built when a concrete use case forces it, same
as `alknet-store-sqlite` for peer/credential persistence. The read stays
sync (called from `AccessControl::check` on the dispatch hot path, no
`.await`), with persistence adapters caching in memory and using honker
NOTIFY for invalidation — same `ArcSwap`-backed full-reload pattern as
`ConfigIdentityProvider` (ADR-035). No new shape invented on the storage
side; no Phase 0 needed for it.
| Crate (some prospective) | Runtime-spawned resource | Ownership derivation | **Decision 2 — integration point: Option 2, `check` consults the
|--------------------------|--------------------------|----------------------| ownership provider directly.** `AccessControl::check` grows a parameter
| alknet-docker | container ID | who created the container | for the ownership provider (or reads one carried on `OperationContext`),
| alknet-tty | a TTY session (often bound to a container) | who opened the session | and consults it for `resource_type`/`resource_action` checks against
| opencode-runner wrapper (a runner crate that starts an opencode instance in a container and wraps its OpenAPI surface via `alknet-http`'s `from_openapi`) | an opencode instance | who started the instance | runtime-spawned resources. The alternative considered and rejected —
| `alknet-container` (fleet normalization, per the POC summary's §6 boundary) | a normalized container across multiple hosts | transitively, who created the underlying container | Option 1, augmenting `Identity.resources` with a per-request snapshot
before calling `check` — preserves `check`'s purity by moving the
impurity one frame up the stack: the dispatcher would pull owned
resources into a per-request identity snapshot so `check` *looks*
unchanged while reading state that was never part of the static
identity. The purity was always theatrical (the question "can X exec
into container C" was never purely a function of identity; it just
looked that way because the resource set was static). Option 2 makes
`check`'s signature honest about what ACL checking *is* in the presence
of dynamic resources: a function of (ACL, Identity, current-ownership-
state). The impurity is real either way; Option 2 puts it in the
signature where it's visible, rather than hiding it in a per-request
snapshot pretending to be static identity. Option 3 (handler-level
ownership check, `AccessControl` gates only scope) was rejected because
it splits the ACL story — some resources statically checked, some
handler-checked — which is the kind of inconsistency that creates the
"figure out how it fits with what is there" cleanup this OQ exists to
prevent.
All share: (a) resources are runtime-spawned, not config-declared; (b) The cost of Option 2 is a `check` signature change — a one-way door,
ownership is derived from creation, not from a static ACL entry; (c) the every call site and test updates. Per the project's decision principle
resource set churns continuously (instances start and stop constantly), (implementation workload is a non-issue relative to semantic correctness
so any model requiring operator config edits per resource is a non-starter; and long-term clarity; "path of least resistance compounded over many
(d) the resource's lifecycle is bound to a process the call protocol decisions is strictly dominated"), this is implementation cost, not a
itself is managing, so ownership state and spawn/teardown must be coupled, semantic cost, and does not bias the choice.
not two separate operator workflows.
Solving this inside the docker crate spec would re-solve it identically in **Refinement that makes Option 2 work cleanly: `OperationSpec` declares
each of those crates and they would diverge. The model has to live in where the resource ID lives in the input.** `OperationSpec` gains a
core/call, once. `resource_id_path: Option<String>` — a JSON pointer into the operation
input, e.g. `"$.containerId"` for `docker/container/exec`. The dispatcher
extracts the resource ID from the input using the spec-declared path,
passes it to `check`, and `check` asks the provider "does this identity
own `<resource_type>/<resource_id>` with action `<resource_action>`?" —
a single targeted lookup, not a whole-resource-set pull. The fit with
JSON Schema is load-bearing, not incidental: `OperationSpec.input_schema`
is already a JSON Schema, so `resource_id_path` is a pointer *within* an
existing schema on the same spec. The `OperationSpec` becomes fully
self-describing for authorization — what resource type, what action, and
*which input field* drives the resource lookup. No per-namespace
conventions, no handler-level knowledge, no "the dispatcher just knows."
The contract is on the spec, where it belongs.
**What the current model does, and why it doesn't fit.** Today - **Open for the ADR.** The decisions above settle the storage shape and
`Identity.resources: HashMap<String, Vec<String>>` is populated on two the integration point. The ADR must still address:
paths, both static: from `PeerEntry.resources` (fingerprint/auth-token,
ADR-030) or from `CompositionAuthority.resources` (composition, ADR-015 /
ADR-022). `AccessControl::check`
(`crates/alknet-call/src/registry/spec.rs:59108`) does a literal
`identity.resources.get(resource_type)` lookup and a string-equals match
on `resource_action`. There is no notion of "the resource set is dynamic"
in that function — it reads a map built when the identity was resolved.
`ConfigIdentityProvider` reads from `ArcSwap<DynamicConfig>` (hot-reloadable
config), and `IdentityStore` (ADR-035) adds async `put_peer` /
`update_peer` / `remove_peer` — but those are *administrative* mutations
(operator-grade peer-record management), not "peer X just spawned resource
Y at runtime; record that Y is owned by X." The resource ownership path is
a different concern from peer-record management, even if they end up
sharing storage.
**The one-way door.** Whether `Identity.resources` stays static-config- 1. **No-specific-resource operations (the `list` case).** Operations
sourced or gains a dynamic-ownership supplement (and what the trait shape with `resource_type` set but `resource_id_path` absent — e.g.
for that supplement is) determines what `AccessControl` declarations the `docker/container/list`, which doesn't reference a specific container.
docker/tty/runner/fleet specs can make, what every `AccessControl::check` There the question is "does this peer have *any* resource of this
call site reads, and what every consumer of `Identity` assumes. This is type?" rather than "does this peer own *this* resource?" Option 2
core, not per-crate. The *mechanism* (a new `ResourceOwnershipProvider` handles this naturally (check asks the provider "any resources of
trait vs. extending `IdentityStore` vs. labels/ownership-tags on the type T for this identity?" when no specific ID is present), but the
spawned resources themselves vs. a capability-style model) is two-way — exact semantics need to be pinned: does the ACL gate the whole call
additive, reversible per the ADR-009 definition — but the model-level (allow/deny), or does the handler filter the result to owned
decision is one-way. containers (allow + filter)? The former is the scope-gating path; the
latter is the result-filtering path. They compose (scope-gate the
call, then filter the result), but the ADR should state which is the
default and how a spec declares which it wants. `list` is the case
that forces this; `exec`/`inspect`/`stop` against a specific
container are the clean case.
**Known consumers of the resolution.** Any spec that declares an 2. **Teardown coupling.** The ownership store's write path (revoke on
`AccessControl` with `resource_type`/`resource_action` against a teardown) must be coupled to the spawned resource's lifecycle, not
runtime-spawned resource set needs this resolved first. Today that's the left to operator workflows. When a container dies or is removed, the
docker, tty, opencode-runner, and `alknet-container` specs; future ownership entry must be revoked — otherwise the store accumulates
runner-shaped crates (any GPU-job runner, any "spawn a process and expose stale entries and an ACL check could reference a resource that no
it over the call protocol" crate) inherit the same requirement. The longer exists. The coupling mechanism (the docker handler explicitly
resolution should make the model available in core/call so those specs calls revoke on container exit, vs. a background reaper, vs. TTL-based
declare ACLs against it directly, rather than each crate inventing a expiry) is two-way-door mechanism work, but the ADR should state the
per-crate ownership layer. coupling requirement and the default mechanism.
3. **Fleet representation (spoke resources on the hub).** When a worker
spoke spawns a container and exposes it to the hub over the call
protocol, the hub's ownership store needs to represent "peer X owns
resource R" for routing/ACL on the hub side. Whether the spoke pushes
ownership records to the hub on spawn, the hub derives them from
`from_call`-discovered operations, or the spoke owns the ACL decision
and the hub forwards — is a real question with cross-node state
implications. The POC summary's §6 head-worker/machine-node model
frames the topology; this question is where that topology meets the
ownership model. Likely the most consequential of the three open
questions.
4. **Composition interaction.** ADR-015/022 populate
`CompositionAuthority.resources` for internal composition calls. With
dynamic ownership, an internal composition that targets a runtime-
spawned resource (a handler composing `docker/container/exec` against
a specific container) needs the composition authority to be checkable
against the ownership store too, not just against the static
`CompositionAuthority.resources` map. Whether `CompositionAuthority`
grows a dynamic-ownership path parallel to `Identity`, or composition
always runs under the caller's ownership, or some third option — needs
to be stated so the privilege model stays coherent with the ownership
model.
These are genuine open questions, not deferred decisions — the ADR must
answer them. They were surfaced by choosing Option 2 + `resource_id_path`
rather than by leaving the integration point undecided; recording them
here so the ADR drafting starts from a known set of specifics to work
out, not from a blank page.
**Phase 0 may be warranted.** Given the one-way door at the model level
and the number of plausible mechanisms (each with different tradeoffs
around consistency, teardown coupling, fleet-scale state, and how a
remote spoke's spawned resources are represented on the hub), a targeted
research/POC pass before the ADR is likely the right sequencing — but
that's a decision to make once the candidate mechanisms are enumerated,
not something this OQ pre-commits.
- **Cross-references**: ADR-009 (door-type-as-deferral anti-pattern), - **Cross-references**: ADR-009 (door-type-as-deferral anti-pattern),
ADR-015, ADR-022 (the static `CompositionAuthority.resources` model this ADR-015, ADR-022 (the static `CompositionAuthority.resources` model this
questions), ADR-030, ADR-035 (`IdentityStore` — administrative peer extends — see open question 4), ADR-030, ADR-033 (repo/adapter pattern —
mutations, a different concern from runtime resource ownership), reused for the ownership store), ADR-035 (`IdentityStore`
administrative peer mutations, a different concern from runtime resource
ownership, but the sync-read + ArcSwap + honker-NOTIFY shape is reused),
[auth.md](crates/core/auth.md) (`Identity.resources`, `AccessControl::check` [auth.md](crates/core/auth.md) (`Identity.resources`, `AccessControl::check`
interaction), interaction — both under edit by this decision),
[operation-registry.md](crates/call/operation-registry.md) (`AccessControl`, [operation-registry.md](crates/call/operation-registry.md) (`AccessControl`,
`OperationSpec`), `OperationSpec``resource_id_path` addition),
[alknet-docker POC summary](../../research/alknet-docker/poc-summary.md) [alknet-docker POC summary](../../research/alknet-docker/poc-summary.md)
§"Open Unknowns" #3 §"Open Unknowns" #3