docs(arch): set OQ-42 decision direction — repo/adapter storage + Option 2 integration
Two structural decisions for dynamic resource ownership (OQ-42), recorded in the OQ so ADR drafting starts from a clear position: 1. Storage side reuses the repo/adapter pattern (ADR-033) — a fourth instance alongside IdentityProvider/IdentityStore/CredentialStore. Trait in alknet-core with an in-memory default adapter; persistence adapter separable. Sync read with ArcSwap + honker-NOTIFY cache invalidation, same shape as ConfigIdentityProvider (ADR-035). No new shape invented; no Phase 0 needed for the storage side. 2. Integration point is Option 2 — AccessControl::check consults the ownership provider directly. Rejected Option 1 (augment identity with a per-request snapshot) because its purity was theatrical — the question 'can X exec into container C' was never purely a function of identity, it just looked that way because the resource set was static. Option 2 makes check's signature honest about what ACL checking is in the presence of dynamic resources. Cost is a check signature change (one-way door, every call site updates) — implementation cost, not semantic cost, per the project's decision principle. Refinement that makes Option 2 clean: OperationSpec gains resource_id_path (JSON pointer into the input, e.g. '$.containerId'). Fits naturally with the existing JSON-Schema-backed input_schema — the pointer is within an existing schema on the same spec. OperationSpec becomes fully self-describing for authorization: resource type, action, and which input field drives the resource lookup, all declared on the spec. Four specifics remain open for the ADR: the no-specific-resource (list) case, teardown coupling, fleet representation (spoke resources on the hub), and composition interaction with dynamic ownership. These were surfaced by choosing Option 2 rather than by leaving the integration point undecided.
This commit is contained in:
@@ -156,7 +156,7 @@ See [open-questions.md](open-questions.md) for the full tracker.
|
|||||||
- **OQ-41**: Stream operators library — a handler-level utility library (filter, map, batch, dedupe, window, etc. on `BoxStream<T>`), prior art in `@alkdev/pubsub/operators.ts`; feature extension, not an architectural decision (the architecture decision — stream composition is handler-level, not protocol-level — is made in ADR-049)
|
- **OQ-41**: Stream operators library — a handler-level utility library (filter, map, batch, dedupe, window, etc. on `BoxStream<T>`), prior art in `@alkdev/pubsub/operators.ts`; feature extension, not an architectural decision (the architecture decision — stream composition is handler-level, not protocol-level — is made in ADR-049)
|
||||||
|
|
||||||
**Open (blocking, requires ADR before the dependent crate specs):**
|
**Open (blocking, requires ADR before the dependent crate specs):**
|
||||||
- **OQ-42**: Dynamic resource ownership for runtime-spawned resources — surfaced by the alknet-docker POC (containers as `AccessControl` resources), generalized to every "spawn a thing at runtime and expose it over the call protocol" crate (docker, tty, opencode-runner wrapper, `alknet-container` fleet layer). The current `Identity.resources` → `AccessControl::check` model is static (config-sourced via `PeerEntry`/`CompositionAuthority`); runtime-spawned resources with derived ownership don't fit. One-way door at the model level (core/call), two-way at the mechanism level. High priority — blocks the docker/tty/runner/fleet crate specs. Likely warrants a Phase 0 research/POC pass before the ADR.
|
- **OQ-42**: Dynamic resource ownership for runtime-spawned resources — surfaced by the alknet-docker POC (containers as `AccessControl` resources), generalized to every "spawn a thing at runtime and expose it over the call protocol" crate (docker, tty, opencode-runner wrapper, `alknet-container` fleet layer). The current `Identity.resources` → `AccessControl::check` model is static (config-sourced via `PeerEntry`/`CompositionAuthority`); runtime-spawned resources with derived ownership don't fit. **Decision direction set**: storage side reuses the repo/adapter pattern (ADR-033, fourth instance alongside `IdentityProvider`/`IdentityStore`/`CredentialStore`); integration point is Option 2 — `AccessControl::check` consults an ownership provider directly, with `OperationSpec` gaining a `resource_id_path` JSON pointer so the spec stays fully self-describing for authorization. Four specifics remain open for the ADR: the no-specific-resource (`list`) case, teardown coupling, fleet representation (spoke resources on the hub), and composition interaction with dynamic ownership. High priority — blocks the docker/tty/runner/fleet crate specs.
|
||||||
|
|
||||||
**Deferred (not active):**
|
**Deferred (not active):**
|
||||||
- **OQ-09**: WASM target boundaries — design constraint, not deliverable
|
- **OQ-09**: WASM target boundaries — design constraint, not deliverable
|
||||||
|
|||||||
@@ -963,107 +963,159 @@ is a feature extension, not an unmade architecture decision.
|
|||||||
- **Origin**: [alknet-docker POC summary](../../research/alknet-docker/poc-summary.md)
|
- **Origin**: [alknet-docker POC summary](../../research/alknet-docker/poc-summary.md)
|
||||||
§"Open Unknowns" #3 (container-as-resource identity model); generalized
|
§"Open Unknowns" #3 (container-as-resource identity model); generalized
|
||||||
during the Phase 1 review pass triggered by that research finding.
|
during the Phase 1 review pass triggered by that research finding.
|
||||||
- **Status**: open
|
- **Status**: open — the two structural decisions below are made; the
|
||||||
- **Door type**: One-way (the `Identity.resources` → `AccessControl::check`
|
remaining questions (listed under "Open for the ADR") are what the ADR
|
||||||
model in core/call), two-way (the mechanism for supplementing it)
|
must settle before the dependent crate specs can be drafted.
|
||||||
|
- **Door type**: One-way (the `AccessControl::check` signature change and
|
||||||
|
the `OperationSpec.resource_id_path` addition in core/call), two-way (the
|
||||||
|
ownership provider mechanism, per the established repo/adapter pattern)
|
||||||
- **Priority**: high — blocks the alknet-docker, alknet-tty, opencode-runner
|
- **Priority**: high — blocks the alknet-docker, alknet-tty, opencode-runner
|
||||||
wrapper, and `alknet-container` (fleet normalization) crate specs. None of
|
wrapper, and `alknet-container` (fleet normalization) crate specs. None of
|
||||||
those specs can declare their `AccessControl` shapes until this is resolved,
|
those specs can declare their `AccessControl` shapes until this is
|
||||||
because the model available to them determines what ACL declarations are
|
resolved, because the model available to them determines what ACL
|
||||||
even expressible. Permitting "docker picks a per-crate default and the
|
declarations are even expressible. Permitting "docker picks a per-crate
|
||||||
others follow" is the door-type-as-deferral anti-pattern (ADR-009 §"What
|
default and the others follow" is the door-type-as-deferral anti-pattern
|
||||||
this framework is NOT"): each crate bakes in an ACL shape and downstream
|
(ADR-009 §"What this framework is NOT"): each crate bakes in an ACL shape
|
||||||
crates build on whatever default was picked, making the "cheap reversal"
|
and downstream crates build on whatever default was picked, making the
|
||||||
expensive.
|
"cheap reversal" expensive.
|
||||||
- **Resolution**: Not yet made. This OQ records the issue and its scope so the
|
|
||||||
architecture workflow can see it; the resolution requires an ADR (and,
|
|
||||||
given the one-way door, likely a Phase 0 research/POC pass first).
|
|
||||||
|
|
||||||
**The issue, generalized.** The alknet-docker POC research flagged that
|
- **Resolution (decided so far).**
|
||||||
containers are a natural resource for `AccessControl` (`resource_type:
|
|
||||||
"container"`, `resource_action: "exec"`), but containers are created at
|
|
||||||
runtime — the resource set is dynamic, and "who can exec into container C"
|
|
||||||
is a function of "who created C," not of a static `PeerEntry.resources`
|
|
||||||
entry an operator wrote. The POC agent's one-line summary — "the
|
|
||||||
`IdentityProvider` model in alknet-core is currently static (`PeerEntry`
|
|
||||||
set). Dynamic resource ownership needs a spec" — is accurate, and the
|
|
||||||
consequence it draws is the right one.
|
|
||||||
|
|
||||||
The issue is broader than the docker case that surfaced it. **Every crate
|
**Decision 1 — storage side: reuse the repo/adapter pattern.** The
|
||||||
that spawns a thing at runtime and exposes it over the call protocol hits
|
ownership store is a fourth instance of the established repo/adapter
|
||||||
the same shape:**
|
pattern (ADR-033), alongside `IdentityProvider` (ADR-004), `IdentityStore`
|
||||||
|
(ADR-035), and `CredentialStore` (ADR-031). Concretely: a trait in
|
||||||
|
`alknet-core` (read method: "does identity X own resource R with action
|
||||||
|
A?" / "what resources of type T does X own?"; write method: "record X
|
||||||
|
spawned R", "revoke R on teardown"), with an in-memory default adapter in
|
||||||
|
core. The in-memory default carries the docker/runner cases with no
|
||||||
|
backend dependency — ownership is runtime state, meaningless across
|
||||||
|
restarts (a container ID from a previous process doesn't exist), so the
|
||||||
|
default case has no persistence requirement. A persistence adapter (e.g.
|
||||||
|
sqlite/honker-backed, for a hub that wants fleet ownership to survive
|
||||||
|
restarts) is separable and built when a concrete use case forces it, same
|
||||||
|
as `alknet-store-sqlite` for peer/credential persistence. The read stays
|
||||||
|
sync (called from `AccessControl::check` on the dispatch hot path, no
|
||||||
|
`.await`), with persistence adapters caching in memory and using honker
|
||||||
|
NOTIFY for invalidation — same `ArcSwap`-backed full-reload pattern as
|
||||||
|
`ConfigIdentityProvider` (ADR-035). No new shape invented on the storage
|
||||||
|
side; no Phase 0 needed for it.
|
||||||
|
|
||||||
| Crate (some prospective) | Runtime-spawned resource | Ownership derivation |
|
**Decision 2 — integration point: Option 2, `check` consults the
|
||||||
|--------------------------|--------------------------|----------------------|
|
ownership provider directly.** `AccessControl::check` grows a parameter
|
||||||
| alknet-docker | container ID | who created the container |
|
for the ownership provider (or reads one carried on `OperationContext`),
|
||||||
| alknet-tty | a TTY session (often bound to a container) | who opened the session |
|
and consults it for `resource_type`/`resource_action` checks against
|
||||||
| opencode-runner wrapper (a runner crate that starts an opencode instance in a container and wraps its OpenAPI surface via `alknet-http`'s `from_openapi`) | an opencode instance | who started the instance |
|
runtime-spawned resources. The alternative considered and rejected —
|
||||||
| `alknet-container` (fleet normalization, per the POC summary's §6 boundary) | a normalized container across multiple hosts | transitively, who created the underlying container |
|
Option 1, augmenting `Identity.resources` with a per-request snapshot
|
||||||
|
before calling `check` — preserves `check`'s purity by moving the
|
||||||
|
impurity one frame up the stack: the dispatcher would pull owned
|
||||||
|
resources into a per-request identity snapshot so `check` *looks*
|
||||||
|
unchanged while reading state that was never part of the static
|
||||||
|
identity. The purity was always theatrical (the question "can X exec
|
||||||
|
into container C" was never purely a function of identity; it just
|
||||||
|
looked that way because the resource set was static). Option 2 makes
|
||||||
|
`check`'s signature honest about what ACL checking *is* in the presence
|
||||||
|
of dynamic resources: a function of (ACL, Identity, current-ownership-
|
||||||
|
state). The impurity is real either way; Option 2 puts it in the
|
||||||
|
signature where it's visible, rather than hiding it in a per-request
|
||||||
|
snapshot pretending to be static identity. Option 3 (handler-level
|
||||||
|
ownership check, `AccessControl` gates only scope) was rejected because
|
||||||
|
it splits the ACL story — some resources statically checked, some
|
||||||
|
handler-checked — which is the kind of inconsistency that creates the
|
||||||
|
"figure out how it fits with what is there" cleanup this OQ exists to
|
||||||
|
prevent.
|
||||||
|
|
||||||
All share: (a) resources are runtime-spawned, not config-declared; (b)
|
The cost of Option 2 is a `check` signature change — a one-way door,
|
||||||
ownership is derived from creation, not from a static ACL entry; (c) the
|
every call site and test updates. Per the project's decision principle
|
||||||
resource set churns continuously (instances start and stop constantly),
|
(implementation workload is a non-issue relative to semantic correctness
|
||||||
so any model requiring operator config edits per resource is a non-starter;
|
and long-term clarity; "path of least resistance compounded over many
|
||||||
(d) the resource's lifecycle is bound to a process the call protocol
|
decisions is strictly dominated"), this is implementation cost, not a
|
||||||
itself is managing, so ownership state and spawn/teardown must be coupled,
|
semantic cost, and does not bias the choice.
|
||||||
not two separate operator workflows.
|
|
||||||
|
|
||||||
Solving this inside the docker crate spec would re-solve it identically in
|
**Refinement that makes Option 2 work cleanly: `OperationSpec` declares
|
||||||
each of those crates and they would diverge. The model has to live in
|
where the resource ID lives in the input.** `OperationSpec` gains a
|
||||||
core/call, once.
|
`resource_id_path: Option<String>` — a JSON pointer into the operation
|
||||||
|
input, e.g. `"$.containerId"` for `docker/container/exec`. The dispatcher
|
||||||
|
extracts the resource ID from the input using the spec-declared path,
|
||||||
|
passes it to `check`, and `check` asks the provider "does this identity
|
||||||
|
own `<resource_type>/<resource_id>` with action `<resource_action>`?" —
|
||||||
|
a single targeted lookup, not a whole-resource-set pull. The fit with
|
||||||
|
JSON Schema is load-bearing, not incidental: `OperationSpec.input_schema`
|
||||||
|
is already a JSON Schema, so `resource_id_path` is a pointer *within* an
|
||||||
|
existing schema on the same spec. The `OperationSpec` becomes fully
|
||||||
|
self-describing for authorization — what resource type, what action, and
|
||||||
|
*which input field* drives the resource lookup. No per-namespace
|
||||||
|
conventions, no handler-level knowledge, no "the dispatcher just knows."
|
||||||
|
The contract is on the spec, where it belongs.
|
||||||
|
|
||||||
**What the current model does, and why it doesn't fit.** Today
|
- **Open for the ADR.** The decisions above settle the storage shape and
|
||||||
`Identity.resources: HashMap<String, Vec<String>>` is populated on two
|
the integration point. The ADR must still address:
|
||||||
paths, both static: from `PeerEntry.resources` (fingerprint/auth-token,
|
|
||||||
ADR-030) or from `CompositionAuthority.resources` (composition, ADR-015 /
|
|
||||||
ADR-022). `AccessControl::check`
|
|
||||||
(`crates/alknet-call/src/registry/spec.rs:59–108`) does a literal
|
|
||||||
`identity.resources.get(resource_type)` lookup and a string-equals match
|
|
||||||
on `resource_action`. There is no notion of "the resource set is dynamic"
|
|
||||||
in that function — it reads a map built when the identity was resolved.
|
|
||||||
`ConfigIdentityProvider` reads from `ArcSwap<DynamicConfig>` (hot-reloadable
|
|
||||||
config), and `IdentityStore` (ADR-035) adds async `put_peer` /
|
|
||||||
`update_peer` / `remove_peer` — but those are *administrative* mutations
|
|
||||||
(operator-grade peer-record management), not "peer X just spawned resource
|
|
||||||
Y at runtime; record that Y is owned by X." The resource ownership path is
|
|
||||||
a different concern from peer-record management, even if they end up
|
|
||||||
sharing storage.
|
|
||||||
|
|
||||||
**The one-way door.** Whether `Identity.resources` stays static-config-
|
1. **No-specific-resource operations (the `list` case).** Operations
|
||||||
sourced or gains a dynamic-ownership supplement (and what the trait shape
|
with `resource_type` set but `resource_id_path` absent — e.g.
|
||||||
for that supplement is) determines what `AccessControl` declarations the
|
`docker/container/list`, which doesn't reference a specific container.
|
||||||
docker/tty/runner/fleet specs can make, what every `AccessControl::check`
|
There the question is "does this peer have *any* resource of this
|
||||||
call site reads, and what every consumer of `Identity` assumes. This is
|
type?" rather than "does this peer own *this* resource?" Option 2
|
||||||
core, not per-crate. The *mechanism* (a new `ResourceOwnershipProvider`
|
handles this naturally (check asks the provider "any resources of
|
||||||
trait vs. extending `IdentityStore` vs. labels/ownership-tags on the
|
type T for this identity?" when no specific ID is present), but the
|
||||||
spawned resources themselves vs. a capability-style model) is two-way —
|
exact semantics need to be pinned: does the ACL gate the whole call
|
||||||
additive, reversible per the ADR-009 definition — but the model-level
|
(allow/deny), or does the handler filter the result to owned
|
||||||
decision is one-way.
|
containers (allow + filter)? The former is the scope-gating path; the
|
||||||
|
latter is the result-filtering path. They compose (scope-gate the
|
||||||
|
call, then filter the result), but the ADR should state which is the
|
||||||
|
default and how a spec declares which it wants. `list` is the case
|
||||||
|
that forces this; `exec`/`inspect`/`stop` against a specific
|
||||||
|
container are the clean case.
|
||||||
|
|
||||||
**Known consumers of the resolution.** Any spec that declares an
|
2. **Teardown coupling.** The ownership store's write path (revoke on
|
||||||
`AccessControl` with `resource_type`/`resource_action` against a
|
teardown) must be coupled to the spawned resource's lifecycle, not
|
||||||
runtime-spawned resource set needs this resolved first. Today that's the
|
left to operator workflows. When a container dies or is removed, the
|
||||||
docker, tty, opencode-runner, and `alknet-container` specs; future
|
ownership entry must be revoked — otherwise the store accumulates
|
||||||
runner-shaped crates (any GPU-job runner, any "spawn a process and expose
|
stale entries and an ACL check could reference a resource that no
|
||||||
it over the call protocol" crate) inherit the same requirement. The
|
longer exists. The coupling mechanism (the docker handler explicitly
|
||||||
resolution should make the model available in core/call so those specs
|
calls revoke on container exit, vs. a background reaper, vs. TTL-based
|
||||||
declare ACLs against it directly, rather than each crate inventing a
|
expiry) is two-way-door mechanism work, but the ADR should state the
|
||||||
per-crate ownership layer.
|
coupling requirement and the default mechanism.
|
||||||
|
|
||||||
|
3. **Fleet representation (spoke resources on the hub).** When a worker
|
||||||
|
spoke spawns a container and exposes it to the hub over the call
|
||||||
|
protocol, the hub's ownership store needs to represent "peer X owns
|
||||||
|
resource R" for routing/ACL on the hub side. Whether the spoke pushes
|
||||||
|
ownership records to the hub on spawn, the hub derives them from
|
||||||
|
`from_call`-discovered operations, or the spoke owns the ACL decision
|
||||||
|
and the hub forwards — is a real question with cross-node state
|
||||||
|
implications. The POC summary's §6 head-worker/machine-node model
|
||||||
|
frames the topology; this question is where that topology meets the
|
||||||
|
ownership model. Likely the most consequential of the three open
|
||||||
|
questions.
|
||||||
|
|
||||||
|
4. **Composition interaction.** ADR-015/022 populate
|
||||||
|
`CompositionAuthority.resources` for internal composition calls. With
|
||||||
|
dynamic ownership, an internal composition that targets a runtime-
|
||||||
|
spawned resource (a handler composing `docker/container/exec` against
|
||||||
|
a specific container) needs the composition authority to be checkable
|
||||||
|
against the ownership store too, not just against the static
|
||||||
|
`CompositionAuthority.resources` map. Whether `CompositionAuthority`
|
||||||
|
grows a dynamic-ownership path parallel to `Identity`, or composition
|
||||||
|
always runs under the caller's ownership, or some third option — needs
|
||||||
|
to be stated so the privilege model stays coherent with the ownership
|
||||||
|
model.
|
||||||
|
|
||||||
|
These are genuine open questions, not deferred decisions — the ADR must
|
||||||
|
answer them. They were surfaced by choosing Option 2 + `resource_id_path`
|
||||||
|
rather than by leaving the integration point undecided; recording them
|
||||||
|
here so the ADR drafting starts from a known set of specifics to work
|
||||||
|
out, not from a blank page.
|
||||||
|
|
||||||
**Phase 0 may be warranted.** Given the one-way door at the model level
|
|
||||||
and the number of plausible mechanisms (each with different tradeoffs
|
|
||||||
around consistency, teardown coupling, fleet-scale state, and how a
|
|
||||||
remote spoke's spawned resources are represented on the hub), a targeted
|
|
||||||
research/POC pass before the ADR is likely the right sequencing — but
|
|
||||||
that's a decision to make once the candidate mechanisms are enumerated,
|
|
||||||
not something this OQ pre-commits.
|
|
||||||
- **Cross-references**: ADR-009 (door-type-as-deferral anti-pattern),
|
- **Cross-references**: ADR-009 (door-type-as-deferral anti-pattern),
|
||||||
ADR-015, ADR-022 (the static `CompositionAuthority.resources` model this
|
ADR-015, ADR-022 (the static `CompositionAuthority.resources` model this
|
||||||
questions), ADR-030, ADR-035 (`IdentityStore` — administrative peer
|
extends — see open question 4), ADR-030, ADR-033 (repo/adapter pattern —
|
||||||
mutations, a different concern from runtime resource ownership),
|
reused for the ownership store), ADR-035 (`IdentityStore` —
|
||||||
|
administrative peer mutations, a different concern from runtime resource
|
||||||
|
ownership, but the sync-read + ArcSwap + honker-NOTIFY shape is reused),
|
||||||
[auth.md](crates/core/auth.md) (`Identity.resources`, `AccessControl::check`
|
[auth.md](crates/core/auth.md) (`Identity.resources`, `AccessControl::check`
|
||||||
interaction),
|
interaction — both under edit by this decision),
|
||||||
[operation-registry.md](crates/call/operation-registry.md) (`AccessControl`,
|
[operation-registry.md](crates/call/operation-registry.md) (`AccessControl`,
|
||||||
`OperationSpec`),
|
`OperationSpec` — `resource_id_path` addition),
|
||||||
[alknet-docker POC summary](../../research/alknet-docker/poc-summary.md)
|
[alknet-docker POC summary](../../research/alknet-docker/poc-summary.md)
|
||||||
§"Open Unknowns" #3
|
§"Open Unknowns" #3
|
||||||
|
|||||||
Reference in New Issue
Block a user