docs(arch): ADR-029 peer-graph routing model — supersedes ADR-028
ADR-028's remote_safe/trusted_peer was a parallel, weaker authorization system
that duplicated the existing AccessControl/Identity machinery and couldn't
express the head→N-workers pattern (the primary use case). The flat-namespace
single-peer overlay model (one connection layer in CompositeOperationEnv)
structurally breaks the moment a head has two workers both exposing
/container/exec.
ADR-029 replaces it with:
- Peer-keyed overlays: PeerCompositeEnv { connections: HashMap<PeerId, ...> }
replaces CompositeOperationEnv's singular connection layer. A head node
routes invoke_peer() to the right peer via PeerRef::Specific / PeerRef::Any.
- AccessControl-based peer authorization: the existing AccessControl::check
(peer_identity) gates peer calls — the same mechanism that gates every other
call. remote_safe/trusted_peer/RemoteFilter/list_operations_peer_scoped/
services_list_handler_peer_scoped are retired. The op's AccessControl IS the
peer-authorization policy; no parallel system.
- ScopedPeerEnv: peer-qualified reachability (peer-pinned allowlist) replaces
from_call's namespace_prefix as the disambiguation mechanism. Cross-peer
collision dissolves (separate sub-overlays); same-peer collision stays error.
- services/list-peers opt-in for peer-attributed re-export listing.
POC-validated against real types (scratch module written, type-checked,
removed; build clean, 207 tests pass). Petgraph not needed for v1 (one-hop,
shallow); nested HashMap suffices; extends to multi-hop without redesign (OQ-32).
OQ impact: OQ-25 dissolved (no marking); OQ-28 cross-peer dissolved / same-peer
stays; OQ-26/27/29 stay; new OQ-30 (Any routing policy), OQ-31 (list-peers
semantics), OQ-32 (multi-hop federation).
Research: docs/research/alknet-call-peer-routing/findings.md (POC shapes,
prior art — Ray.io actors, Dapr service invocation, full ADR draft).
ADR-028 marked Superseded; ADR-017 DC-1 amendment updated to point at ADR-029.
This commit is contained in:
@@ -65,7 +65,8 @@ The alknet-call crate is **implemented and reviewed** — both the server-side c
|
|||||||
| [025](decisions/025-vault-local-only-dispatch.md) | Vault Local-Only Dispatch | Accepted |
|
| [025](decisions/025-vault-local-only-dispatch.md) | Vault Local-Only Dispatch | Accepted |
|
||||||
| [026](decisions/026-vault-key-model-hd-derivation.md) | Vault Key Model — HD Derivation | Accepted |
|
| [026](decisions/026-vault-key-model-hd-derivation.md) | Vault Key Model — HD Derivation | Accepted |
|
||||||
| [027](decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md) | TLS Identity Redesign — ACME + RawKey Decoupling | Accepted |
|
| [027](decisions/027-tls-identity-redesign-acme-rawkey-decoupling.md) | TLS Identity Redesign — ACME + RawKey Decoupling | Accepted |
|
||||||
| [028](decisions/028-callclient-peer-scoped-registry-filtering.md) | Peer-Scoped Registry Filtering for CallClient Inbound Dispatch | Accepted |
|
| [028](decisions/028-callclient-peer-scoped-registry-filtering.md) | Peer-Scoped Registry Filtering for CallClient Inbound Dispatch | ~~Accepted~~ → **Superseded** by ADR-029 |
|
||||||
|
| [029](decisions/029-peer-graph-routing-model.md) | Peer-Graph Routing Model for alknet-call Composition | Proposed |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
@@ -97,12 +98,15 @@ See [open-questions.md](open-questions.md) for the full tracker.
|
|||||||
- **OQ-23**: Handler identity registration path — registration bundle with provenance, composition authority, scoped env, capabilities (ADR-022)
|
- **OQ-23**: Handler identity registration path — registration bundle with provenance, composition authority, scoped env, capabilities (ADR-022)
|
||||||
- **OQ-24**: Operation error schemas — declared domain errors with typed `details` payload; adapter fidelity for `from_openapi`/`to_openapi` (ADR-023)
|
- **OQ-24**: Operation error schemas — declared domain errors with typed `details` payload; adapter fidelity for `from_openapi`/`to_openapi` (ADR-023)
|
||||||
|
|
||||||
**Open (two-way-door remainders from alknet-call completion):**
|
**Open (two-way-door remainders from alknet-call completion + peer-graph routing):**
|
||||||
- **OQ-25**: Remote-safe marking shape — existence of default-deny `CallClient` filtering locked by ADR-028; shape (`remote_safe: bool` v1 vs per-peer allowlist) open
|
- **OQ-25**: ~~Remote-safe marking shape~~ — **dissolved by ADR-029** (no marking; peer authorization is `AccessControl::check(peer_identity)`)
|
||||||
- **OQ-26**: `OperationAdapter` error type — `import()` returns `Result<_, AdapterError>`; variants decided in implementation
|
- **OQ-26**: `OperationAdapter` error type — `import()` returns `Result<_, AdapterError>`; variants decided in implementation
|
||||||
- **OQ-27**: `from_call` re-import trigger — v1 default auto-on-reconnect; explicit `refresh()` additive
|
- **OQ-27**: `from_call` re-import trigger — v1 default auto-on-reconnect; explicit `refresh()` additive
|
||||||
- **OQ-28**: `from_call` namespace collision — v1 default error-on-collision (no prefix by default)
|
- **OQ-28**: `from_call` namespace collision — cross-peer **dissolved by ADR-029** (separate sub-overlays); same-peer stays error
|
||||||
- **OQ-29**: `CallClient` TLS client-auth + remote-identity verification — v1 connects with `with_no_client_auth()` and `AcceptAnyServerCertVerifier`; wiring RawKey client-auth is additive (the no-env-vars invariant is unaffected — `auth_token` flows through the call-protocol payload, not TLS)
|
- **OQ-29**: `CallClient` TLS client-auth — v1 `with_no_client_auth()` + `AcceptAnyServerCertVerifier`; wiring RawKey client-auth is additive
|
||||||
|
- **OQ-30**: `PeerRef::Any` routing policy — v1 insertion-order first-match; round-robin/least-loaded is future (ADR-029)
|
||||||
|
- **OQ-31**: `services/list-peers` re-export semantics — v1 "own ops only"; `services/list-peers` is opt-in (ADR-029)
|
||||||
|
- **OQ-32**: Multi-hop federation — v1 one-hop; peer-keyed model extends without redesign; petgraph candidate (ADR-029)
|
||||||
|
|
||||||
**Deferred (not active):**
|
**Deferred (not active):**
|
||||||
- **OQ-09**: WASM target boundaries — design constraint, not deliverable
|
- **OQ-09**: WASM target boundaries — design constraint, not deliverable
|
||||||
|
|||||||
@@ -38,7 +38,8 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions,
|
|||||||
| [022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Handler Registration, Provenance, and Composition Authority | Registration bundle carries provenance, composition authority, scoped env, capabilities |
|
| [022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Handler Registration, Provenance, and Composition Authority | Registration bundle carries provenance, composition authority, scoped env, capabilities |
|
||||||
| [023](../../decisions/023-operation-error-schemas.md) | Operation Error Schemas | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity |
|
| [023](../../decisions/023-operation-error-schemas.md) | Operation Error Schemas | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity |
|
||||||
| [024](../../decisions/024-operation-registry-layering.md) | Operation Registry Layering | Curated (static) + session/connection overlays (dynamic); `OperationEnv` as trait-object integration point; `OperationContext.env` split into `scoped_env` (data) and `env` (dispatch trait) |
|
| [024](../../decisions/024-operation-registry-layering.md) | Operation Registry Layering | Curated (static) + session/connection overlays (dynamic); `OperationEnv` as trait-object integration point; `OperationContext.env` split into `scoped_env` (data) and `env` (dispatch trait) |
|
||||||
| [028](../../decisions/028-callclient-peer-scoped-registry-filtering.md) | Peer-Scoped Registry Filtering for CallClient Inbound Dispatch | Default-deny peer-scoped registry view; `remote_safe` marking on `HandlerRegistration`; trusted-peer opt-in; locks the ADR-017 §1 security-dimension one-way door |
|
| [028](../../decisions/028-callclient-peer-scoped-registry-filtering.md) | ~~Peer-Scoped Registry Filtering~~ | ~~Accepted~~ → **Superseded** by ADR-029 (flat-namespace single-peer model couldn't express head→N-workers; parallel auth system duplicated `AccessControl`) |
|
||||||
|
| [029](../../decisions/029-peer-graph-routing-model.md) | Peer-Graph Routing Model | Peer-keyed overlays + `PeerRef` routing; `AccessControl`-based peer authorization; retires `remote_safe`/`trusted_peer` |
|
||||||
|
|
||||||
## Relevant Open Questions
|
## Relevant Open Questions
|
||||||
|
|
||||||
@@ -49,11 +50,14 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions,
|
|||||||
| OQ-14 | Batch operation semantics | resolved | Correlated `call.requested` events is the correct protocol design |
|
| OQ-14 | Batch operation semantics | resolved | Correlated `call.requested` events is the correct protocol design |
|
||||||
| OQ-16 | Safe vault operations for call protocol exposure | resolved (ADR-014) | None exposed for now |
|
| OQ-16 | Safe vault operations for call protocol exposure | resolved (ADR-014) | None exposed for now |
|
||||||
| OQ-19 | Session-scoped operation registries | resolved | Agent-written operations overlaid on curated registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait. Generalized by ADR-024 to cover connection-scoped overlays. |
|
| OQ-19 | Session-scoped operation registries | resolved | Agent-written operations overlaid on curated registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait. Generalized by ADR-024 to cover connection-scoped overlays. |
|
||||||
| OQ-25 | Remote-safe marking shape for CallClient peer-scoped filtering | open (two-way) | Existence of default-deny filtering locked by ADR-028; shape (`remote_safe: bool` v1 vs per-peer allowlist) is the two-way-door remainder |
|
| OQ-25 | ~~Remote-safe marking shape~~ | **dissolved** (ADR-029) | `remote_safe`/`trusted_peer` retired; peer authorization is `AccessControl::check(peer_identity)` |
|
||||||
| OQ-26 | OperationAdapter error type (AdapterError variants) | open (two-way) | `import()` returns `Result<_, AdapterError>`; variants decided in implementation |
|
| OQ-26 | OperationAdapter error type (AdapterError variants) | open (two-way) | `import()` returns `Result<_, AdapterError>`; variants decided in implementation |
|
||||||
| OQ-27 | from_call re-import trigger | open (two-way) | v1 default: auto-on-reconnect; explicit `refresh()` is additive |
|
| OQ-27 | from_call re-import trigger | open (two-way) | v1 default: auto-on-reconnect; explicit `refresh()` additive |
|
||||||
| OQ-28 | from_call namespace collision behavior | open (two-way) | v1 default: error on collision (no prefix by default) |
|
| OQ-28 | from_call namespace collision | cross-peer **dissolved** (ADR-029) / same-peer stays | Cross-peer: separate sub-overlays, no collision. Same-peer: error. `namespace_prefix` is local-naming sugar |
|
||||||
| OQ-29 | CallClient TLS client-auth and remote-identity verification | open (two-way) | v1 connects with `with_no_client_auth()` + `AcceptAnyServerCertVerifier`; wiring RawKey client-auth and a real `ServerCertVerifier` is additive (no-env-vars invariant unaffected — `auth_token` flows via call-protocol payload, not TLS) |
|
| OQ-29 | CallClient TLS client-auth and remote-identity verification | open (two-way) | v1 `with_no_client_auth()` + `AcceptAnyServerCertVerifier`; wiring RawKey client-auth is additive (orthogonal to ADR-029) |
|
||||||
|
| OQ-30 | `PeerRef::Any` routing policy | open (two-way) | v1 insertion-order first-match; round-robin/least-loaded is future (ADR-029) |
|
||||||
|
| OQ-31 | `services/list-peers` re-export semantics | open (two-way) | v1 "own ops only"; `services/list-peers` is opt-in (ADR-029) |
|
||||||
|
| OQ-32 | Multi-hop federation | open | v1 one-hop; peer-keyed model extends without redesign; petgraph candidate (ADR-029) |
|
||||||
|
|
||||||
## Key Design Principles
|
## Key Design Principles
|
||||||
|
|
||||||
|
|||||||
@@ -168,10 +168,13 @@ The dispatch loop is **shared** with `CallClient` (ADR-017 §1): both
|
|||||||
`CallAdapter::handle` (accept path) and `CallClient::connect` (connect path)
|
`CallAdapter::handle` (accept path) and `CallClient::connect` (connect path)
|
||||||
construct a `Dispatcher` (`protocol/dispatch.rs`) and call `run_loop` — the
|
construct a `Dispatcher` (`protocol/dispatch.rs`) and call `run_loop` — the
|
||||||
dispatch half is one implementation, the connection-establishment half differs
|
dispatch half is one implementation, the connection-establishment half differs
|
||||||
(accept vs dial). The `Dispatcher` carries a `RemoteFilter` (ADR-028) that
|
(accept vs dial). Peer authorization flows through the existing
|
||||||
gates dispatch by `remote_safe`; the accept path uses `RemoteFilter::trusted()`
|
`AccessControl::check(peer_identity)` — no `RemoteFilter`/`remote_safe` gate
|
||||||
by convention. See [client-and-adapters.md](client-and-adapters.md) for the
|
(ADR-029 §3). The composition env is peer-keyed (`PeerCompositeEnv`,
|
||||||
`Dispatcher`/`RemoteFilter` mechanism.
|
ADR-029 §1) to handle head→N-workers routing. See
|
||||||
|
[client-and-adapters.md](client-and-adapters.md) for the `Dispatcher` mechanism
|
||||||
|
and [ADR-029](../../decisions/029-peer-graph-routing-model.md) for the
|
||||||
|
peer-graph routing model.
|
||||||
|
|
||||||
### Stream Model
|
### Stream Model
|
||||||
|
|
||||||
@@ -535,7 +538,7 @@ Handlers clean up resources when their call is cancelled (in Rust, the future is
|
|||||||
| Abort cascade for nested calls | [ADR-016](../../decisions/016-abort-cascade-for-nested-calls.md) | `call.aborted` cascades to descendants; default `abort-dependents`, `continue-running` opt-in |
|
| Abort cascade for nested calls | [ADR-016](../../decisions/016-abort-cascade-for-nested-calls.md) | `call.aborted` cascades to descendants; default `abort-dependents`, `continue-running` opt-in |
|
||||||
| Call protocol client and adapter contract | [ADR-017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | `CallClient` opens connections; `from_call` imports remote ops; connection direction independent of call direction. Client/adapter surface specced in [client-and-adapters.md](client-and-adapters.md) |
|
| Call protocol client and adapter contract | [ADR-017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | `CallClient` opens connections; `from_call` imports remote ops; connection direction independent of call direction. Client/adapter surface specced in [client-and-adapters.md](client-and-adapters.md) |
|
||||||
| Handler registration, provenance, and composition authority | [ADR-022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Registration bundle carries provenance, composition authority, scoped env, capabilities; dispatch path reads from bundle |
|
| Handler registration, provenance, and composition authority | [ADR-022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Registration bundle carries provenance, composition authority, scoped env, capabilities; dispatch path reads from bundle |
|
||||||
| Peer-scoped registry filtering for CallClient | [ADR-028](../../decisions/028-callclient-peer-scoped-registry-filtering.md) | Default-deny `CallClient` registry view; `remote_safe` marking; trusted-peer opt-in |
|
| Peer-graph routing model (supersedes ADR-028) | [ADR-029](../../decisions/029-peer-graph-routing-model.md) | Peer-keyed overlays + `PeerRef` routing; `AccessControl`-based peer authorization; retires `remote_safe`/`trusted_peer` |
|
||||||
| Operation error schemas | [ADR-023](../../decisions/023-operation-error-schemas.md) | Operations declare domain errors; `call.error` carries typed `details` |
|
| Operation error schemas | [ADR-023](../../decisions/023-operation-error-schemas.md) | Operations declare domain errors; `call.error` carries typed `details` |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
@@ -546,8 +549,15 @@ See [open-questions.md](../../open-questions.md) for full details.
|
|||||||
- **OQ-14** (resolved): Batch is a client-side pattern of correlated `call.requested` events, not a protocol primitive.
|
- **OQ-14** (resolved): Batch is a client-side pattern of correlated `call.requested` events, not a protocol primitive.
|
||||||
- **OQ-16** (resolved by ADR-014): No vault operations are exposed over the call protocol for now.
|
- **OQ-16** (resolved by ADR-014): No vault operations are exposed over the call protocol for now.
|
||||||
- **OQ-19** (resolved): Session-scoped operation registries — agent-written operations overlaid on global registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait.
|
- **OQ-19** (resolved): Session-scoped operation registries — agent-written operations overlaid on global registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait.
|
||||||
- **OQ-25..28** (open, two-way): Call-completion remainders — `CallClient` remote-safe marking shape, `OperationAdapter` error type, `from_call` re-import trigger, `from_call` namespace collision. The `CallClient`/adapter surface itself is specced in [client-and-adapters.md](client-and-adapters.md); the one-way door among these (existence of default-deny filtering) is resolved by ADR-028.
|
- **OQ-25** (dissolved by ADR-029): `remote_safe` marking shape — moot;
|
||||||
- **OQ-29** (open, two-way): `CallClient` TLS client-auth + remote-identity verification — v1 connects with `with_no_client_auth()` and `AcceptAnyServerCertVerifier`; wiring RawKey client-auth and a real `ServerCertVerifier` is additive. See [client-and-adapters.md](client-and-adapters.md).
|
`remote_safe`/`trusted_peer` retired; peer authorization is
|
||||||
|
`AccessControl::check(peer_identity)`.
|
||||||
|
- **OQ-26..29** (OQ-26/27/29 open two-way; OQ-28 cross-peer dissolved / same-peer stays):
|
||||||
|
`OperationAdapter` error type, `from_call` re-import trigger, `from_call`
|
||||||
|
namespace collision, `CallClient` TLS client-auth. See
|
||||||
|
[client-and-adapters.md](client-and-adapters.md) and ADR-029.
|
||||||
|
- **OQ-30..32** (open): `PeerRef::Any` routing policy, `services/list-peers`
|
||||||
|
re-export semantics, multi-hop federation. See ADR-029.
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
status: draft
|
status: draft
|
||||||
last_updated: 2026-06-26
|
last_updated: 2026-06-27
|
||||||
---
|
---
|
||||||
|
|
||||||
# alknet-call — Client and Adapters
|
# alknet-call — Client and Adapters
|
||||||
@@ -61,9 +61,16 @@ fills the gap ADR-017 left to implementation: the `CallClient` API, the
|
|||||||
`from_call`/`from_jsonschema` flows, the trait signature, the adapter
|
`from_call`/`from_jsonschema` flows, the trait signature, the adapter
|
||||||
location, the credential invariant, and the bilateral pattern. The gap
|
location, the credential invariant, and the bilateral pattern. The gap
|
||||||
analysis (`docs/research/alknet-call-completion/gap-analysis.md`) identified
|
analysis (`docs/research/alknet-call-completion/gap-analysis.md`) identified
|
||||||
four decisions (DC-1..4) needed before implementation; DC-1 is resolved by
|
four decisions (DC-1..4) needed before implementation. DC-1 was initially
|
||||||
ADR-028, and DC-2/3/4 are two-way-door defaults recorded here and tracked as
|
resolved by ADR-028 (`remote_safe`/`trusted_peer`), but a subsequent research
|
||||||
OQs (DC-2→OQ-27, DC-3→OQ-28, DC-4→OQ-26).
|
pass (`docs/research/alknet-call-peer-routing/findings.md`) found that
|
||||||
|
ADR-028's model was structurally broken for the head→N-workers pattern (the
|
||||||
|
primary use case) and that its parallel `remote_safe`/`trusted_peer`
|
||||||
|
authorization system duplicated the existing `AccessControl`/`Identity`
|
||||||
|
machinery. **ADR-029 supersedes ADR-028**: peer-keyed overlays + `PeerRef`
|
||||||
|
routing, and peer authorization through the existing `AccessControl::check(peer_identity)`.
|
||||||
|
DC-2/3/4 are two-way-door defaults recorded here (DC-2→OQ-27, DC-3→OQ-28
|
||||||
|
cross-peer dissolved / same-peer stays, DC-4→OQ-26).
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
@@ -79,31 +86,13 @@ accept path is the producer on the inbound side. Both produce the same
|
|||||||
|
|
||||||
```rust
|
```rust
|
||||||
pub struct CallClient {
|
pub struct CallClient {
|
||||||
/// The operation registry. The peer-scoped view is a dispatch-time read
|
|
||||||
/// over this registry, not a copy (ADR-028 §5).
|
|
||||||
registry: Arc<OperationRegistry>,
|
registry: Arc<OperationRegistry>,
|
||||||
identity_provider: Arc<dyn IdentityProvider>,
|
identity_provider: Arc<dyn IdentityProvider>,
|
||||||
/// Trusted-peer mode (ADR-028 §3): when true, the dispatch path exposes
|
|
||||||
/// all External ops to the remote peer and `services/list` lists all
|
|
||||||
/// External ops, ignoring the `remote_safe` marking. When false
|
|
||||||
/// (default), only registrations with `remote_safe: true` dispatch, and
|
|
||||||
/// `services/list` hides non-remote-safe ops (ADR-028 Assumption 2).
|
|
||||||
trusted_peer: bool,
|
|
||||||
}
|
}
|
||||||
|
|
||||||
impl CallClient {
|
impl CallClient {
|
||||||
/// Default-deny mode: only `remote_safe: true` ops dispatch/list to the
|
|
||||||
/// remote peer (ADR-028).
|
|
||||||
pub fn new(registry: Arc<OperationRegistry>, idp: Arc<dyn IdentityProvider>) -> Self;
|
pub fn new(registry: Arc<OperationRegistry>, idp: Arc<dyn IdentityProvider>) -> Self;
|
||||||
|
|
||||||
/// Trusted-peer mode: construct a CallClient that exposes all External
|
|
||||||
/// ops from `registry` to the remote peer, ignoring the remote-safe
|
|
||||||
/// marking. Explicit opt-in per ADR-028 §3.
|
|
||||||
pub fn trusted_peer(
|
|
||||||
registry: Arc<OperationRegistry>,
|
|
||||||
identity_provider: Arc<dyn IdentityProvider>,
|
|
||||||
) -> Self;
|
|
||||||
|
|
||||||
/// Open a QUIC connection to `addr` on ALPN `alknet/call`, perform
|
/// Open a QUIC connection to `addr` on ALPN `alknet/call`, perform
|
||||||
/// credential handshake, and return a CallConnection running the shared
|
/// credential handshake, and return a CallConnection running the shared
|
||||||
/// dispatch loop. Credentials come from capabilities (ADR-014), not env
|
/// dispatch loop. Credentials come from capabilities (ADR-014), not env
|
||||||
@@ -118,20 +107,25 @@ impl CallClient {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
The v1 mechanism is the `trusted_peer: bool` flag plus the `remote_safe: bool`
|
Peer authorization flows through the existing `AccessControl::check` against
|
||||||
field on each `HandlerRegistration` (default `false` across all provenance,
|
the peer's resolved `Identity` (ADR-029 §3) — there is no `trusted_peer` flag
|
||||||
ADR-028 §4). A richer per-peer filtering mechanism (per-peer allowlist,
|
and no `remote_safe` marking. When a remote peer calls an op, the dispatch
|
||||||
capability-class tag) is the two-way-door remainder tracked as OQ-25; v1's
|
path resolves the peer's `Identity` (from the connection's TLS fingerprint or
|
||||||
boolean limits exposure control to "remote-safe for any peer" vs "not," which
|
the `auth_token` payload, via the existing `IdentityProvider`) and runs
|
||||||
is acceptable for the runner/dispatch pattern (one remote peer per
|
`AccessControl::check(peer_identity)` against the op's `AccessControl`. If
|
||||||
`CallClient`).
|
the op's required scopes/resources are satisfied, the call dispatches; if not,
|
||||||
|
`FORBIDDEN` before the handler runs (capabilities never populated — the
|
||||||
|
security property). An op that should never be callable from the wire uses
|
||||||
|
`Visibility::Internal` (existing mechanism, `NOT_FOUND` before ACL). See
|
||||||
|
[ADR-029](../../decisions/029-peer-graph-routing-model.md) §3 for the full
|
||||||
|
mapping of the three `remote_safe` cases to `AccessControl`/`Visibility`.
|
||||||
|
|
||||||
The connection is symmetric after establishment (ADR-017 §2): both sides can
|
The connection is symmetric after establishment (ADR-017 §2): both sides can
|
||||||
send and receive `call.requested`. Connection direction (who opened it) is
|
send and receive `call.requested`. Connection direction (who opened it) is
|
||||||
independent of call direction (who calls whom). The `CallClient` is therefore
|
independent of call direction (who calls whom). The `CallClient` is therefore
|
||||||
both a caller and a callee — it dispatches incoming calls from the remote
|
both a caller and a callee — it dispatches incoming calls from the remote
|
||||||
peer against its peer-scoped registry view, and it initiates outgoing calls
|
peer through the same `AccessControl`-gated path, and it initiates outgoing
|
||||||
through the `CallConnection::call()` / `subscribe()` / `abort()` API.
|
calls through the `CallConnection::call()` / `subscribe()` / `abort()` API.
|
||||||
|
|
||||||
#### Shared Dispatcher
|
#### Shared Dispatcher
|
||||||
|
|
||||||
@@ -143,13 +137,6 @@ accept path and `CallClient`'s connect path construct a `Dispatcher` and call
|
|||||||
connection-establishment half differs (accept vs dial).
|
connection-establishment half differs (accept vs dial).
|
||||||
|
|
||||||
```rust
|
```rust
|
||||||
/// Peer-scoped registry filter state (ADR-028). `trusted_peer: false`
|
|
||||||
/// (default-deny for a CallClient) hides ops whose
|
|
||||||
/// `HandlerRegistration.remote_safe` is false from both dispatch and
|
|
||||||
/// `services/list`. `trusted_peer: true` (explicit opt-in, also used by the
|
|
||||||
/// CallAdapter's local accept path) bypasses the filter.
|
|
||||||
pub struct RemoteFilter { pub trusted_peer: bool }
|
|
||||||
|
|
||||||
/// Shared dispatcher for an established CallConnection. Constructed by both
|
/// Shared dispatcher for an established CallConnection. Constructed by both
|
||||||
/// CallAdapter (accept path) and CallClient (connect path). Holds no
|
/// CallAdapter (accept path) and CallClient (connect path). Holds no
|
||||||
/// per-connection state; the CallConnection is passed into run_loop.
|
/// per-connection state; the CallConnection is passed into run_loop.
|
||||||
@@ -158,37 +145,54 @@ pub struct Dispatcher {
|
|||||||
pub identity_provider: Arc<dyn IdentityProvider>,
|
pub identity_provider: Arc<dyn IdentityProvider>,
|
||||||
pub session_source: Option<Arc<dyn SessionOverlaySource + Send + Sync>>,
|
pub session_source: Option<Arc<dyn SessionOverlaySource + Send + Sync>>,
|
||||||
pub default_timeout: Duration,
|
pub default_timeout: Duration,
|
||||||
pub remote_filter: RemoteFilter,
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
The `remote_filter` is the dispatch-time gate that enforces ADR-028's
|
The dispatch path resolves the peer's `Identity`, runs `AccessControl::check`
|
||||||
default-deny: `dispatch_requested` checks `remote_filter.allows(registration.remote_safe)`
|
against the op's `AccessControl`, and dispatches if allowed — the same
|
||||||
**before** building the context or invoking the handler — a non-remote-safe op
|
authorization machinery that gates every other call. No `RemoteFilter`, no
|
||||||
returns `NOT_FOUND` before any capability material reaches the handler (the
|
`remote_safe` gate (ADR-029 §3 retires these).
|
||||||
security argument for default-deny, ADR-028 Context). The accept path
|
|
||||||
(`CallAdapter`) uses `RemoteFilter::trusted()` by convention — a direct QUIC
|
|
||||||
client is not a filtered `CallClient` peer in the ADR-028 sense.
|
|
||||||
|
|
||||||
`CallClient::spawn_dispatch(connection)` is the lower-level API that takes a
|
`CallClient::spawn_dispatch(connection)` is the lower-level API that takes a
|
||||||
pre-established `Connection`, constructs a `CallConnection`, builds a
|
pre-established `Connection`, constructs a `CallConnection`, builds a
|
||||||
`Dispatcher` with the appropriate `RemoteFilter`, spawns the dispatch task,
|
`Dispatcher`, spawns the dispatch task, and returns the live `CallConnection`.
|
||||||
and returns the live `CallConnection`. `connect()` uses it after the QUIC dial
|
`connect()` uses it after the QUIC dial completes; tests use it to wire
|
||||||
completes; tests use it to wire mock/loopback connections directly.
|
mock/loopback connections directly.
|
||||||
|
|
||||||
#### services/list peer-scoped serving
|
#### Peer-keyed composition env (ADR-029)
|
||||||
|
|
||||||
The `services/list` hide behavior (ADR-028 Assumption 2) is wired via a
|
The composition env that aggregates multiple connections is **peer-keyed**
|
||||||
separate handler factory: `services_list_handler_peer_scoped(registry,
|
(ADR-029 §1). `CompositeOperationEnv`'s singular
|
||||||
trusted_peer)` in `registry/discovery.rs`, backed by
|
`connection: Option<Arc<dyn OperationEnv>>` is replaced by `PeerCompositeEnv`
|
||||||
`OperationRegistry::list_operations_peer_scoped(trusted_peer)`. The assembly
|
with peer-keyed connections:
|
||||||
layer constructs the `CallClient`'s registry with this peer-scoped handler
|
|
||||||
(not the plain `services_list_handler` used by the `CallAdapter`'s local
|
```rust
|
||||||
accept path) so that when the remote peer calls `services/list` on the
|
pub struct PeerCompositeEnv {
|
||||||
`CallClient`, the response hides non-remote-safe ops in default-deny mode.
|
pub base: Arc<dyn OperationEnv + Send + Sync>, // Layer 0 curated
|
||||||
The dispatch-path `RemoteFilter` (above) and the `services/list`-handler
|
pub session: Option<Arc<dyn OperationEnv + Send + Sync>>, // Layer 1
|
||||||
filter are the two halves of the same default-deny posture — discovery and
|
pub connections: HashMap<PeerId, Arc<dyn OperationEnv + Send + Sync>>, // Layer 2, peer-keyed
|
||||||
dispatch filters agree.
|
connection_order: Vec<PeerId>, // insertion order for PeerRef::Any first-match
|
||||||
|
}
|
||||||
|
pub type PeerId = String; // = Identity.id
|
||||||
|
```
|
||||||
|
|
||||||
|
`OperationEnv` gains a peer-routing method with a `PeerRef` selector
|
||||||
|
(`Specific(PeerId)` / `Any`), default-impl for back-compat. See
|
||||||
|
[ADR-029](../../decisions/029-peer-graph-routing-model.md) §2 for the full
|
||||||
|
`invoke_peer` signature and `ScopedPeerEnv` peer-qualified reachability. The
|
||||||
|
per-`CallConnection` overlay stays flat (one connection = one peer); the
|
||||||
|
peer-keying is at the aggregation layer (the head node's composition env).
|
||||||
|
|
||||||
|
#### services/list
|
||||||
|
|
||||||
|
`services/list` filters by `AccessControl::check(calling_peer_identity)` —
|
||||||
|
the calling peer sees only ops it is authorized to call. The
|
||||||
|
`services_list_handler` / `services_list_handler_peer_scoped` split collapses
|
||||||
|
to a single `AccessControl`-filtered handler (the `peer_scoped` variant and
|
||||||
|
the `remote_safe` filter are removed). `services/list-peers` is the opt-in for
|
||||||
|
peer-attributed re-export listing (each peer's sub-overlay listed with
|
||||||
|
attribution, filtered by the calling peer's authorization). See
|
||||||
|
[ADR-029](../../decisions/029-peer-graph-routing-model.md) §6.
|
||||||
|
|
||||||
### Credential sources for connections
|
### Credential sources for connections
|
||||||
|
|
||||||
@@ -287,10 +291,14 @@ a stale overlay dies with the connection; re-import on reconnect is naturally
|
|||||||
scoped to the new connection. This is the v1 default; explicit re-import via a
|
scoped to the new connection. This is the v1 default; explicit re-import via a
|
||||||
future `CallConnection::refresh()` is additive.
|
future `CallConnection::refresh()` is additive.
|
||||||
|
|
||||||
**Namespace collision** (DC-3, OQ-28): optional prefix, default no prefix,
|
**Namespace collision** (DC-3, OQ-28): under the peer-graph model (ADR-029),
|
||||||
collision = error. A node importing from two remotes that both expose
|
cross-peer collision dissolves — same name on different peers is fine (they
|
||||||
`/container/exec` without prefixes should fail loudly. The operator adds
|
live in separate peer sub-overlays, no prefix needed). Same-peer collision
|
||||||
prefixes when they know they're importing from multiple sources.
|
stays an error (a peer shouldn't expose two ops with the same name).
|
||||||
|
`FromCallConfig::namespace_prefix` is optional local-naming sugar for when
|
||||||
|
the importing node wants to expose a peer's ops under a different name
|
||||||
|
*locally* — a local-naming concern, not a disambiguation concern. It defaults
|
||||||
|
to `None`.
|
||||||
|
|
||||||
**Trust is transitive** (recorded in `operation-registry.md`): a
|
**Trust is transitive** (recorded in `operation-registry.md`): a
|
||||||
`from_call`-imported operation executes the remote node's code, not yours.
|
`from_call`-imported operation executes the remote node's code, not yours.
|
||||||
@@ -520,10 +528,13 @@ Based on the gap analysis and the downstream unblock chain:
|
|||||||
4. **`from_jsonschema`** (medium, standalone) — schema-only registration, no
|
4. **`from_jsonschema`** (medium, standalone) — schema-only registration, no
|
||||||
handler. Small.
|
handler. Small.
|
||||||
|
|
||||||
5. **DC-1 resolution** (peer-scoped registry filtering, ADR-028) — the
|
5. **DC-1 resolution** (peer-graph routing model, ADR-029) — the
|
||||||
security dimension of `CallClient`'s registry. Addressed in parallel with
|
peer-keyed overlay + `AccessControl`-based peer authorization model that
|
||||||
#1 — it's a filtering layer on the registry the `CallClient` exposes, not
|
replaces ADR-028's `remote_safe`/`trusted_peer`. This is a structural
|
||||||
a blocker for the connection-establishment work.
|
change to `CompositeOperationEnv` (→ `PeerCompositeEnv`), the dispatch
|
||||||
|
path (retire `RemoteFilter`), and `OperationEnv` (gain `invoke_peer`).
|
||||||
|
See ADR-029 for the migration; the POC shapes in the research doc are the
|
||||||
|
reference.
|
||||||
|
|
||||||
## What This Completion Unblocks
|
## What This Completion Unblocks
|
||||||
|
|
||||||
@@ -547,13 +558,23 @@ Based on the gap analysis and the downstream unblock chain:
|
|||||||
call protocol's wire format carries no private keys, API keys, or decrypted
|
call protocol's wire format carries no private keys, API keys, or decrypted
|
||||||
credentials (ADR-014). The no-env-vars invariant (above) is the dispatch-side
|
credentials (ADR-014). The no-env-vars invariant (above) is the dispatch-side
|
||||||
corollary.
|
corollary.
|
||||||
- **Peer-scoped registry is default-deny.** A `CallClient` exposes no
|
- **Peer authorization via `AccessControl`.** A remote peer's call is
|
||||||
operations to the remote peer unless marked remote-safe. Trusted-peer
|
authorized by `AccessControl::check(peer_identity)` against the op's
|
||||||
opt-in is explicit (ADR-028).
|
`AccessControl` — the same mechanism that gates every other call. No
|
||||||
|
`remote_safe` flag, no `trusted_peer` bypass (ADR-029 §3). An op with
|
||||||
|
`AccessControl::default()` is callable by any peer; an op with
|
||||||
|
`required_scopes` is callable only by peers whose `Identity.scopes` satisfy
|
||||||
|
them; an op with `Visibility::Internal` is never callable from the wire.
|
||||||
|
- **Composition env is peer-keyed.** A head node with N worker connections
|
||||||
|
holds a `PeerCompositeEnv` with `connections: HashMap<PeerId, Arc<dyn OperationEnv>>`,
|
||||||
|
not a singular connection overlay. `invoke_peer()` routes to the right peer
|
||||||
|
via `PeerRef::Specific` / `PeerRef::Any` (ADR-029 §1-2).
|
||||||
- **`from_call` re-import is auto-on-reconnect.** v1 default; the overlay is
|
- **`from_call` re-import is auto-on-reconnect.** v1 default; the overlay is
|
||||||
per-connection so re-import is naturally scoped (DC-2, OQ-27).
|
per-connection so re-import is naturally scoped (DC-2, OQ-27).
|
||||||
- **`from_call` namespace collision is an error.** Default no prefix; the
|
- **`from_call` namespace collision is same-peer only.** Cross-peer collision
|
||||||
operator adds prefixes when importing from multiple sources (DC-3, OQ-28).
|
dissolves (same name on different peers is fine — separate sub-overlays,
|
||||||
|
ADR-029 §5). Same-peer collision stays an error. `namespace_prefix` is
|
||||||
|
optional local-naming sugar, not the disambiguation mechanism (DC-3, OQ-28).
|
||||||
- **`OperationAdapter::import()` returns `Result`.** Failures surface as
|
- **`OperationAdapter::import()` returns `Result`.** Failures surface as
|
||||||
`AdapterError` (DC-4, OQ-26).
|
`AdapterError` (DC-4, OQ-26).
|
||||||
- **MCP stdio transport is not built.** Streamable HTTP is the only supported
|
- **MCP stdio transport is not built.** Streamable HTTP is the only supported
|
||||||
@@ -565,7 +586,8 @@ Based on the gap analysis and the downstream unblock chain:
|
|||||||
| Decision | ADR | Summary |
|
| Decision | ADR | Summary |
|
||||||
|----------|-----|---------|
|
|----------|-----|---------|
|
||||||
| Call protocol client and adapter contract | [ADR-017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | `CallClient` opens connections; `from_call` imports remote ops; connection direction independent of call direction; trait is async; adapters produce `HandlerRegistration` bundles |
|
| Call protocol client and adapter contract | [ADR-017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | `CallClient` opens connections; `from_call` imports remote ops; connection direction independent of call direction; trait is async; adapters produce `HandlerRegistration` bundles |
|
||||||
| Peer-scoped registry filtering (DC-1) | [ADR-028](../../decisions/028-callclient-peer-scoped-registry-filtering.md) | Default-deny; `remote_safe: bool` on `HandlerRegistration`; trusted-peer opt-in; one-way door on the security dimension |
|
| Peer-graph routing model (DC-1, supersedes ADR-028) | [ADR-029](../../decisions/029-peer-graph-routing-model.md) | Peer-keyed overlays + `PeerRef` routing; peer authorization via existing `AccessControl::check(peer_identity)`; retires `remote_safe`/`trusted_peer` |
|
||||||
|
| ~~Peer-scoped registry filtering~~ (superseded) | ~~[ADR-028](../../decisions/028-callclient-peer-scoped-registry-filtering.md)~~ | ~~Default-deny; `remote_safe: bool`; trusted-peer opt-in~~ — superseded by ADR-029 (flat-namespace single-peer model couldn't express head→N-workers; parallel auth system duplicated existing `AccessControl`) |
|
||||||
| Secret material flow and capability injection | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | The no-env-vars invariant's foundation; capabilities injected at assembly layer |
|
| Secret material flow and capability injection | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | The no-env-vars invariant's foundation; capabilities injected at assembly layer |
|
||||||
| Handler registration, provenance, and composition authority | [ADR-022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | The registration bundle adapters produce; `composition_authority: None` for leaves |
|
| Handler registration, provenance, and composition authority | [ADR-022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | The registration bundle adapters produce; `composition_authority: None` for leaves |
|
||||||
| Operation registry layering | [ADR-024](../../decisions/024-operation-registry-layering.md) | Layer 2 per-connection overlay where `from_call` imports land |
|
| Operation registry layering | [ADR-024](../../decisions/024-operation-registry-layering.md) | Layer 2 per-connection overlay where `from_call` imports land |
|
||||||
@@ -583,38 +605,50 @@ Based on the gap analysis and the downstream unblock chain:
|
|||||||
|
|
||||||
See [open-questions.md](../../open-questions.md) for full details.
|
See [open-questions.md](../../open-questions.md) for full details.
|
||||||
|
|
||||||
- **OQ-25** (open, two-way): Remote-safe marking shape — `remote_safe: bool`
|
- **OQ-25** (dissolved by ADR-029): `remote_safe` marking shape — moot.
|
||||||
v1 vs per-peer allowlist vs capability-class tag. The *existence* of
|
`remote_safe`/`trusted_peer` are retired; peer authorization is
|
||||||
filtering is locked by ADR-028; the shape is the two-way-door remainder.
|
`AccessControl::check(peer_identity)`. No marking to shape.
|
||||||
- **OQ-26** (open, two-way): `AdapterError` enum variants (DC-4). The
|
- **OQ-26** (open, two-way): `AdapterError` enum variants (DC-4). The
|
||||||
*presence* of an error type is recorded here; the variants are
|
*presence* of an error type is recorded here; the variants are
|
||||||
implementation-detail.
|
implementation-detail. A `SamePeerCollision` variant may replace the flat
|
||||||
|
`Conflict` variant (ADR-029 §5).
|
||||||
- **OQ-27** (open, two-way): `from_call` re-import trigger — auto-on-reconnect
|
- **OQ-27** (open, two-way): `from_call` re-import trigger — auto-on-reconnect
|
||||||
(v1 default, recorded here) vs explicit `CallConnection::refresh()`. v1 is
|
(v1 default, recorded here) vs explicit `CallConnection::refresh()`. v1 is
|
||||||
auto-on-reconnect; the explicit path is additive.
|
auto-on-reconnect; the explicit path is additive. The overlay is now
|
||||||
- **OQ-28** (open, two-way): `from_call` namespace collision behavior — error
|
peer-scoped (drops with the connection), so re-import is naturally scoped.
|
||||||
on collision (v1 default, recorded here) vs last-wins.
|
- **OQ-28** (cross-peer dissolved by ADR-029 / same-peer stays): Cross-peer
|
||||||
|
collision dissolves — same name on different peers is fine (separate
|
||||||
|
sub-overlays). Same-peer collision stays an error. `namespace_prefix` is
|
||||||
|
optional local-naming sugar, not the disambiguation mechanism.
|
||||||
- **OQ-29** (open, two-way): `CallClient` TLS client-auth + remote-identity
|
- **OQ-29** (open, two-way): `CallClient` TLS client-auth + remote-identity
|
||||||
verification — v1 connects with `with_no_client_auth()` and
|
verification — v1 connects with `with_no_client_auth()` and
|
||||||
`AcceptAnyServerCertVerifier` (does not present a client cert, does not pin
|
`AcceptAnyServerCertVerifier`. Wiring RawKey client-auth is additive.
|
||||||
the remote's expected identity from `credentials.remote_identity`). Wiring
|
Orthogonal to the routing model (ADR-029); `auth_token` flows through the
|
||||||
the local node's RawKey/X509 identity as a rustls client-auth cert and
|
call-protocol payload, not TLS, so the no-env-vars invariant is unaffected.
|
||||||
plugging `remote_identity` into a real `ServerCertVerifier` is additive.
|
- **OQ-30** (open, two-way): `PeerRef::Any` routing policy — v1 insertion-order
|
||||||
The one-way constraint (credentials from `Capabilities`, ADR-014) is
|
first-match; round-robin/least-loaded is the future extension (ADR-029 §2).
|
||||||
unaffected — `auth_token` flows through the call-protocol payload, not TLS.
|
- **OQ-31** (open, two-way): `services/list-peers` re-export semantics — v1
|
||||||
|
defaults to "own ops only"; `services/list-peers` is the opt-in (ADR-029 §6).
|
||||||
|
- **OQ-32** (open): Multi-hop federation — v1 is one-hop; the peer-keyed
|
||||||
|
overlay model extends to multi-hop without redesign; petgraph is the
|
||||||
|
candidate if path-finding becomes real (ADR-029 §3.7).
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|
||||||
- ADR-017: Call Protocol Client and Adapter Contract (the spec this document
|
- ADR-017: Call Protocol Client and Adapter Contract (the spec this document
|
||||||
operationally fills)
|
operationally fills)
|
||||||
- ADR-028: Peer-Scoped Registry Filtering for CallClient Inbound Dispatch
|
- ADR-029: Peer-Graph Routing Model (supersedes ADR-028; resolves DC-1 with
|
||||||
(resolves DC-1)
|
peer-keyed overlays + `AccessControl`-based peer authorization)
|
||||||
|
- ~~ADR-028~~: Peer-Scoped Registry Filtering (superseded by ADR-029)
|
||||||
- `call-protocol.md` — `CallAdapter`, `CallConnection`, dispatch loop, stream
|
- `call-protocol.md` — `CallAdapter`, `CallConnection`, dispatch loop, stream
|
||||||
model (the server-side complement to this document)
|
model (the server-side complement to this document)
|
||||||
- `operation-registry.md` — `HandlerRegistration`, provenance, capability
|
- `operation-registry.md` — `HandlerRegistration`, provenance, capability
|
||||||
injection, service discovery (the discovery API `from_call` consumes)
|
injection, service discovery (the discovery API `from_call` consumes)
|
||||||
- `docs/research/alknet-call-completion/gap-analysis.md` — DC-1..4, the
|
- `docs/research/alknet-call-completion/gap-analysis.md` — DC-1..4, the
|
||||||
implementation-state audit, the downstream unblock chain
|
implementation-state audit, the downstream unblock chain
|
||||||
|
- `docs/research/alknet-call-peer-routing/findings.md` — the peer-graph
|
||||||
|
routing research that identified ADR-028's structural gap and validated
|
||||||
|
the ADR-029 design via POC
|
||||||
- `/workspace/@alkdev/operations/` — TypeScript prior art (`from_openapi.ts`,
|
- `/workspace/@alkdev/operations/` — TypeScript prior art (`from_openapi.ts`,
|
||||||
`from_mcp.ts`, `from_schema.ts`, `scanner.ts`)
|
`from_mcp.ts`, `from_schema.ts`, `scanner.ts`)
|
||||||
- `/workspace/@alkdev/dispatch/` — concrete downstream consumer (container
|
- `/workspace/@alkdev/dispatch/` — concrete downstream consumer (container
|
||||||
|
|||||||
@@ -232,8 +232,9 @@ pub struct HandlerRegistration {
|
|||||||
pub composition_authority: Option<CompositionAuthority>, // None for leaves
|
pub composition_authority: Option<CompositionAuthority>, // None for leaves
|
||||||
pub scoped_env: Option<ScopedOperationEnv>, // None for leaves
|
pub scoped_env: Option<ScopedOperationEnv>, // None for leaves
|
||||||
pub capabilities: Capabilities,
|
pub capabilities: Capabilities,
|
||||||
pub remote_safe: bool, // default false; ADR-028 — exposes this op to
|
// NOTE: ADR-028 added `remote_safe: bool` here; ADR-029 supersedes it and
|
||||||
// CallClient peers (trusted-peer mode bypasses)
|
// removes the field. Peer authorization is `AccessControl::check(peer_identity)`,
|
||||||
|
// not a per-op boolean. See ADR-029 §3.
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -664,7 +665,8 @@ The `Capabilities` type holds non-serializable, zeroized secret material. It doe
|
|||||||
| Operation registry layering | [ADR-024](../../decisions/024-operation-registry-layering.md) | Curated (static, immutable) + session and connection overlays (dynamic); `OperationEnv` as trait-object integration point; `OperationContext.env` split into `scoped_env` (data) and `env` (dispatch trait) |
|
| Operation registry layering | [ADR-024](../../decisions/024-operation-registry-layering.md) | Curated (static, immutable) + session and connection overlays (dynamic); `OperationEnv` as trait-object integration point; `OperationContext.env` split into `scoped_env` (data) and `env` (dispatch trait) |
|
||||||
| Operation error schemas | [ADR-023](../../decisions/023-operation-error-schemas.md) | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity for `from_openapi`/`to_openapi` |
|
| Operation error schemas | [ADR-023](../../decisions/023-operation-error-schemas.md) | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity for `from_openapi`/`to_openapi` |
|
||||||
| Call protocol client and adapter contract | [ADR-017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | `from_call`/`from_jsonschema`/`OperationAdapter` produce `HandlerRegistration` bundles; adapter-registered ops are `Internal` leaves. Surface specced in [client-and-adapters.md](client-and-adapters.md) |
|
| Call protocol client and adapter contract | [ADR-017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | `from_call`/`from_jsonschema`/`OperationAdapter` produce `HandlerRegistration` bundles; adapter-registered ops are `Internal` leaves. Surface specced in [client-and-adapters.md](client-and-adapters.md) |
|
||||||
| Peer-scoped registry filtering for CallClient | [ADR-028](../../decisions/028-callclient-peer-scoped-registry-filtering.md) | Default-deny `CallClient` registry view; adds `remote_safe` marking to `HandlerRegistration` (the bundle this doc defines) |
|
| Peer-graph routing model (supersedes ADR-028) | [ADR-029](../../decisions/029-peer-graph-routing-model.md) | Peer-keyed overlays + `PeerRef` routing; peer authorization via `AccessControl::check(peer_identity)`; retires `remote_safe`/`trusted_peer` (the field this doc's `HandlerRegistration` previously gained) |
|
||||||
|
| ~~Peer-scoped registry filtering~~ (superseded) | ~~[ADR-028](../../decisions/028-callclient-peer-scoped-registry-filtering.md)~~ | ~~`remote_safe` marking on `HandlerRegistration`~~ — superseded by ADR-029 |
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
@@ -674,8 +676,14 @@ See [open-questions.md](../../open-questions.md) for full details.
|
|||||||
- **OQ-14** (resolved): Batch is a client-side pattern of correlated `call.requested` events, not a protocol primitive.
|
- **OQ-14** (resolved): Batch is a client-side pattern of correlated `call.requested` events, not a protocol primitive.
|
||||||
- **OQ-16** (resolved by ADR-014): No vault operations are exposed over the call protocol for now.
|
- **OQ-16** (resolved by ADR-014): No vault operations are exposed over the call protocol for now.
|
||||||
- **OQ-19** (resolved): Session-scoped operation registries — agent-written operations overlaid on the curated registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait. Session ops are `Session` provenance (ADR-022) — always `Internal`, compose under restricted authority scoped down at sandbox creation. Generalized by ADR-024 to cover connection-scoped overlays as well.
|
- **OQ-19** (resolved): Session-scoped operation registries — agent-written operations overlaid on the curated registry via `OperationEnv` trait layering. Protocol doesn't need changes; `OperationEnv` must remain a trait. Session ops are `Session` provenance (ADR-022) — always `Internal`, compose under restricted authority scoped down at sandbox creation. Generalized by ADR-024 to cover connection-scoped overlays as well.
|
||||||
- **OQ-25** (open, two-way): Remote-safe marking shape — existence of default-deny `CallClient` filtering locked by ADR-028; the shape (the `remote_safe: bool` field this doc's `HandlerRegistration` gains vs a richer per-peer mechanism) is the two-way-door remainder. See [client-and-adapters.md](client-and-adapters.md).
|
- **OQ-25** (dissolved by ADR-029): `remote_safe` marking shape — moot.
|
||||||
- **OQ-26..28** (open, two-way): `OperationAdapter` error type, `from_call` re-import trigger, `from_call` namespace collision. v1 defaults recorded in [client-and-adapters.md](client-and-adapters.md).
|
`remote_safe`/`trusted_peer` are retired; peer authorization is
|
||||||
|
`AccessControl::check(peer_identity)`, the existing mechanism. See
|
||||||
|
[client-and-adapters.md](client-and-adapters.md) and ADR-029 §3.
|
||||||
|
- **OQ-26..28** (OQ-26/27 stay two-way; OQ-28 cross-peer dissolved by ADR-029 /
|
||||||
|
same-peer stays): `OperationAdapter` error type, `from_call` re-import
|
||||||
|
trigger, `from_call` namespace collision. v1 defaults recorded in
|
||||||
|
[client-and-adapters.md](client-and-adapters.md).
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|
||||||
|
|||||||
@@ -360,19 +360,20 @@ noted re-import hot-swap is a two-way door; §3 mentioned the namespace prefix).
|
|||||||
The call-completion gap analysis (`docs/research/alknet-call-completion/gap-analysis.md`
|
The call-completion gap analysis (`docs/research/alknet-call-completion/gap-analysis.md`
|
||||||
DC-1..4) resolved them. The resolutions:
|
DC-1..4) resolved them. The resolutions:
|
||||||
|
|
||||||
### DC-1 — CallClient registry scope: resolved by ADR-028
|
### DC-1 — CallClient registry scope: resolved by ADR-028, superseded by ADR-029
|
||||||
|
|
||||||
The §1 Consequences security dimension is resolved by
|
The §1 Consequences security dimension was originally resolved by ADR-028
|
||||||
[ADR-028](028-callclient-peer-scoped-registry-filtering.md). The one-way
|
(default-deny `remote_safe: bool` + `trusted_peer` opt-in). **ADR-028 is now
|
||||||
door (existence of peer-scoped filtering as the v1 default) is locked:
|
superseded by [ADR-029](029-peer-graph-routing-model.md)** (2026-06-27):
|
||||||
**default-deny**, with a `remote_safe: bool` on `HandlerRegistration`
|
the flat-namespace single-peer model ADR-028 built on cannot express the
|
||||||
v1 shape and a trusted-peer opt-in. The shape of the marking is the
|
head→N-workers pattern, and the `remote_safe`/`trusted_peer` gate duplicates
|
||||||
two-way-door remainder, tracked as OQ-25. This ADR's §1 text ("It has its own
|
the existing `AccessControl`/`Identity` machinery while reintroducing the
|
||||||
operation registry to dispatch incoming calls from the remote side") and
|
blanket-bypass anti-pattern ADR-015 killed. ADR-029 replaces the flat overlay
|
||||||
the Consequences note ("The specific mechanism … is a two-way door") are
|
with peer-keyed overlays + `PeerRef` routing, and retires `remote_safe`/
|
||||||
superseded by ADR-028's decision that the *default* is filtered, not
|
`trusted_peer` in favor of `AccessControl::check(peer_identity)` — the
|
||||||
shared-global. Share-global remains available as the explicit opt-in
|
existing authorization path that was already in the dispatch path. The peer-
|
||||||
(ADR-028 §3).
|
scoping question this section flagged is now answered structurally (peer-keyed
|
||||||
|
overlays), not by a parallel boolean gate.
|
||||||
|
|
||||||
### DC-4 — OperationAdapter trait error type: resolved
|
### DC-4 — OperationAdapter trait error type: resolved
|
||||||
|
|
||||||
|
|||||||
@@ -2,7 +2,20 @@
|
|||||||
|
|
||||||
## Status
|
## Status
|
||||||
|
|
||||||
Accepted
|
**Superseded** by [ADR-029](029-peer-graph-routing-model.md) (2026-06-27).
|
||||||
|
|
||||||
|
ADR-028 introduced `remote_safe: bool` and `trusted_peer: bool` as a parallel
|
||||||
|
authorization system for peer-scoped dispatch. This was a structural miss: the
|
||||||
|
flat-namespace single-peer model it built on cannot express the head→N-workers
|
||||||
|
pattern (the primary use case), and the parallel `remote_safe`/`trusted_peer`
|
||||||
|
gate duplicates the existing `AccessControl`/`Identity` machinery (which
|
||||||
|
already authorizes peer calls) while reintroducing the blanket-bypass
|
||||||
|
anti-pattern ADR-015 was written to kill. ADR-029 replaces the flat overlay
|
||||||
|
with peer-keyed overlays + `PeerRef` routing, and retires `remote_safe`/
|
||||||
|
`trusted_peer` in favor of the existing `AccessControl::check(peer_identity)`.
|
||||||
|
See ADR-029 for the design that replaces this one; see
|
||||||
|
`docs/research/alknet-call-peer-routing/findings.md` for the research that
|
||||||
|
identified the gap.
|
||||||
|
|
||||||
## Context
|
## Context
|
||||||
|
|
||||||
|
|||||||
293
docs/architecture/decisions/029-peer-graph-routing-model.md
Normal file
293
docs/architecture/decisions/029-peer-graph-routing-model.md
Normal file
@@ -0,0 +1,293 @@
|
|||||||
|
# ADR-029: Peer-Graph Routing Model for alknet-call Composition
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Proposed (supersedes ADR-028)
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The call protocol's composition model is **flat per overlay and single-peer**.
|
||||||
|
`CompositeOperationEnv` holds one `connection: Option<Arc<dyn OperationEnv>>`
|
||||||
|
overlay; the Layer 2 imported-ops overlay on `CallConnection` is a flat
|
||||||
|
`HashMap<String, HandlerRegistration>` keyed by operation name. This works for
|
||||||
|
one remote peer. The head→many-workers / hub→spoke pattern (the ray.io model,
|
||||||
|
and the primary downstream use case — the container-service rewrite this
|
||||||
|
completion was supposed to unblock) cannot be expressed:
|
||||||
|
|
||||||
|
1. **Overlay collision.** A head importing from worker A and worker B, both
|
||||||
|
exposing `/container/exec`, has no way to route
|
||||||
|
`invoke("container", "exec")` to the right peer. The composite env holds
|
||||||
|
one connection overlay; even with two, `contains("container/exec")` is
|
||||||
|
true for both with no disambiguation.
|
||||||
|
|
||||||
|
2. **`from_call` namespace prefix is a naming-convention hack.** DC-3 / OQ-28
|
||||||
|
made `FromCallConfig::namespace_prefix` the disambiguation mechanism — the
|
||||||
|
operator prefixes imported op names so two peers' ops don't collide in a
|
||||||
|
flat map. This pushes disambiguation to the caller and into the
|
||||||
|
`ScopedOperationEnv { allowed: HashSet<String> }` reachability list. It is
|
||||||
|
bolted onto a flat map instead of being structural routing.
|
||||||
|
|
||||||
|
3. **ADR-028's `remote_safe: bool` + `trusted_peer: bool` is a second,
|
||||||
|
parallel, weaker authorization system.** ADR-028 introduced a
|
||||||
|
`RemoteFilter { trusted_peer: bool }` gate in `protocol/dispatch.rs` that
|
||||||
|
runs *before* the existing `AccessControl::check`.
|
||||||
|
`trusted_peer: true` is a blanket security-bypass flag — the exact
|
||||||
|
anti-pattern ADR-015 was written to kill (it replaced `trusted: true` with
|
||||||
|
the authority-switch model). ADR-028 reintroduced it at the peer boundary.
|
||||||
|
The existing authorization machinery in core (`Identity` with scopes and
|
||||||
|
resources, `IdentityProvider`, `AccessControl::check`) is real, grounded,
|
||||||
|
and already wired into the dispatch path — ADR-028 should have *used* it for
|
||||||
|
peer authorization, not invented a parallel system.
|
||||||
|
|
||||||
|
This is a blocking structural fix, not a "v1/later" refinement. The research
|
||||||
|
at `docs/research/alknet-call-peer-routing/findings.md` validates the design
|
||||||
|
through a POC that type-checks against the real types (since removed; the
|
||||||
|
shapes are recorded in the research doc). ADR-028 is superseded by this ADR.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
### 1. Peer-keyed overlays
|
||||||
|
|
||||||
|
The Layer 2 overlay becomes peer-keyed at the composition-env level.
|
||||||
|
`CompositeOperationEnv`'s singular `connection: Option<Arc<dyn OperationEnv>>`
|
||||||
|
is replaced by `PeerCompositeEnv` with peer-keyed connections:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct PeerCompositeEnv {
|
||||||
|
pub base: Arc<dyn OperationEnv + Send + Sync>, // Layer 0 curated
|
||||||
|
pub session: Option<Arc<dyn OperationEnv + Send + Sync>>, // Layer 1
|
||||||
|
pub connections: HashMap<PeerId, Arc<dyn OperationEnv + Send + Sync>>, // Layer 2, peer-keyed
|
||||||
|
connection_order: Vec<PeerId>, // insertion order for PeerRef::Any first-match
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The per-`CallConnection` overlay stays flat (one connection = one peer — a
|
||||||
|
flat `HashMap<String, HandlerRegistration>` per connection is correct). The
|
||||||
|
peer-keying is at the *aggregation* layer: the head node's composition env
|
||||||
|
holds a `HashMap<PeerId, connection_overlay>`, not one overlay. `PeerId` is
|
||||||
|
the peer's `Identity.id` — the same field `Connection::identity()` already
|
||||||
|
exposes, already resolved in the dispatch path, and already unique per peer.
|
||||||
|
|
||||||
|
### 2. `PeerRef` routing selector
|
||||||
|
|
||||||
|
`OperationEnv` gains a peer-routing method with a `PeerRef` selector. The
|
||||||
|
default-impl preserves back-compat (existing impls that don't override it
|
||||||
|
delegate to `invoke_with_policy`, preserving current behavior):
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub enum PeerRef {
|
||||||
|
Specific(PeerId), // route to this peer; NOT_FOUND if it doesn't serve the op
|
||||||
|
Any, // first peer (insertion order) that serves it
|
||||||
|
}
|
||||||
|
pub type PeerId = String; // = Identity.id
|
||||||
|
|
||||||
|
async fn invoke_peer(&self, peer: &PeerRef, namespace: &str, operation: &str,
|
||||||
|
input: Value, parent: &OperationContext, policy: AbortPolicy) -> ResponseEnvelope {
|
||||||
|
// default: ignore peer selector, dispatch via invoke_with_policy
|
||||||
|
self.invoke_with_policy(namespace, operation, input, parent, policy).await
|
||||||
|
}
|
||||||
|
fn peer_contains(&self, _peer: &PeerId, name: &str) -> bool { self.contains(name) }
|
||||||
|
```
|
||||||
|
|
||||||
|
`PeerRef::Specific(PeerId)` routes to the named peer's overlay; if that peer
|
||||||
|
doesn't serve the op, `NOT_FOUND` (no silent fallthrough — explicit routing
|
||||||
|
must be honored or fail loudly). `PeerRef::Any` routes to the first peer
|
||||||
|
(insertion order) whose overlay contains the op — the "any worker that serves
|
||||||
|
this name" fan-out primitive. A richer `RoutingPolicy` (round-robin,
|
||||||
|
least-loaded) is the two-way-door remainder tracked as OQ-30; the `PeerRef`
|
||||||
|
enum is designed to compose with it without breaking the signature.
|
||||||
|
|
||||||
|
The existing `invoke()` / `invoke_with_policy()` methods stay as the
|
||||||
|
`PeerRef::Any` equivalent for code that doesn't care about peer selection.
|
||||||
|
|
||||||
|
### 3. `AccessControl`-based peer authorization; retire `remote_safe`/`trusted_peer`
|
||||||
|
|
||||||
|
`RemoteFilter`, `HandlerRegistration::remote_safe`,
|
||||||
|
`CallClient::trusted_peer`, `OperationRegistry::list_operations_peer_scoped`,
|
||||||
|
and `services_list_handler_peer_scoped` are **removed**. Peer authorization
|
||||||
|
flows through the existing `AccessControl::check` against the peer's resolved
|
||||||
|
`Identity`:
|
||||||
|
|
||||||
|
- A remote peer's call arrives → `dispatch_requested` resolves the peer's
|
||||||
|
`Identity` (already does, from the connection's TLS fingerprint or the
|
||||||
|
`auth_token` payload) → `OperationRegistry::invoke` runs
|
||||||
|
`AccessControl::check(peer_identity)`.
|
||||||
|
- If the op's `AccessControl` is satisfied → dispatch (capabilities populated
|
||||||
|
from the bundle, same as today).
|
||||||
|
- If not → `FORBIDDEN` (capabilities never populated — the security property
|
||||||
|
ADR-028 wanted, achieved by the existing ACL, not a parallel gate).
|
||||||
|
- If the op is `Visibility::Internal` → `NOT_FOUND` before ACL (existing
|
||||||
|
behavior). This is the "never callable from wire" case.
|
||||||
|
|
||||||
|
The three cases `remote_safe` was meant to handle map to existing mechanisms:
|
||||||
|
|
||||||
|
| `remote_safe` case | Replacement |
|
||||||
|
|---|---|
|
||||||
|
| Op callable by any peer (was `remote_safe: true`) | `AccessControl::default()` — no restrictions; implicitly "remote-safe" because it requires no privileged scope. |
|
||||||
|
| Op callable only by some peers | `AccessControl { required_scopes: [...] }` — only peers whose `Identity.scopes` satisfy the AND-gate may call. Per-peer differentiation via `IdentityProvider` config. |
|
||||||
|
| Op never callable from wire | `Visibility::Internal` — `NOT_FOUND` before ACL. Existing mechanism, unchanged. |
|
||||||
|
|
||||||
|
**The op's `AccessControl` *is* the peer-authorization policy.** There is no
|
||||||
|
separate exposure decision. If the peer's `Identity` satisfies the op's
|
||||||
|
`AccessControl`, the op dispatches and capabilities populate (same as for any
|
||||||
|
authorized caller). If not, `FORBIDDEN` before the handler — capabilities
|
||||||
|
never populate. The exposure decision and the authorization decision are the
|
||||||
|
same decision, made through one mechanism, not two.
|
||||||
|
|
||||||
|
### 4. Peer-qualified reachability (`ScopedPeerEnv`)
|
||||||
|
|
||||||
|
`ScopedOperationEnv { allowed: HashSet<String> }` is extended with an optional
|
||||||
|
peer-pinned allowlist. Unqualified reachability (peer-agnostic composition —
|
||||||
|
"I want to call `container/exec` on whichever worker serves it") stays the
|
||||||
|
common case; peer-pinning is opt-in for the disambiguation case that replaces
|
||||||
|
`FromCallConfig::namespace_prefix`:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct ScopedPeerEnv {
|
||||||
|
pub allowed_ops: HashSet<String>, // peer-agnostic — reachable via PeerRef::Any
|
||||||
|
pub peer_pinned: HashSet<String>, // "peer-id/op-name" — reachable only via PeerRef::Specific(that peer)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Instead of prefixing the *op name* (the flat-namespace hack), you pin the
|
||||||
|
*peer* in the reachability set. The existing `ScopedOperationEnv.allowed`
|
||||||
|
becomes the `allowed_ops` field; peer-pinning is additive.
|
||||||
|
|
||||||
|
### 5. `from_call` peer-keyed registration; collision rule change
|
||||||
|
|
||||||
|
`from_call` registers into the specific peer's sub-overlay, not a flat
|
||||||
|
overlay. Cross-peer collision dissolves: same name on different peers is fine
|
||||||
|
(separate sub-overlays, no collision, no prefix needed). Same-peer collision
|
||||||
|
stays an error (a peer shouldn't expose two ops with the same name).
|
||||||
|
|
||||||
|
`FromCallConfig::namespace_prefix` becomes optional local-naming sugar for
|
||||||
|
the case where the importing node wants to expose a peer's ops under a
|
||||||
|
different name *locally* — a local-naming concern, not a disambiguation
|
||||||
|
concern. It defaults to `None`.
|
||||||
|
|
||||||
|
### 6. `services/list` `AccessControl`-filtered; `services/list-peers` opt-in
|
||||||
|
|
||||||
|
`services/list` filters by `AccessControl::check(calling_peer_identity)` — the
|
||||||
|
calling peer sees only ops it is authorized to call. The
|
||||||
|
`services_list_handler` / `services_list_handler_peer_scoped` split collapses
|
||||||
|
to a single `AccessControl`-filtered handler. `services/list-peers` is the
|
||||||
|
opt-in for peer-attributed re-export listing (each peer's sub-overlay listed
|
||||||
|
with attribution, filtered by the calling peer's authorization).
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Positive:**
|
||||||
|
- The head→N-workers pattern works. A head with multiple worker connections
|
||||||
|
routes `invoke()` to the right peer via `PeerRef`. This is the primary use
|
||||||
|
case the previous model couldn't express.
|
||||||
|
- One authorization system, not two. Peer authorization flows through the
|
||||||
|
existing `AccessControl`/`Identity` machinery — the same mechanism that
|
||||||
|
gates every other call. No parallel `remote_safe` gate, no blanket-bypass
|
||||||
|
`trusted_peer` flag. Per-peer differentiation is via `IdentityProvider`
|
||||||
|
config (different peers get different scopes), which is a real
|
||||||
|
authorization decision, not a boolean.
|
||||||
|
- Structural disconnect cleanup. When a peer disconnects, its sub-overlay
|
||||||
|
drops (the `PeerId` key is removed from `connections`). No stale overlay,
|
||||||
|
no explicit deregistration. An in-flight `PeerRef::Specific(that_peer)` gets
|
||||||
|
`NOT_FOUND` — the correct failure mode.
|
||||||
|
- `from_call` collision dissolves across peers. Two workers exposing
|
||||||
|
`/container/exec` coexist; the prefix is no longer the disambiguation
|
||||||
|
mechanism.
|
||||||
|
- The `OperationEnv` trait gains a method with a default-impl, preserving
|
||||||
|
back-compat. Existing impls (`LocalOperationEnv`, `OverlayOperationEnv`)
|
||||||
|
work unchanged; `PeerCompositeEnv` overrides with real peer routing.
|
||||||
|
- The peer-keyed overlay model extends naturally to multi-hop federation (a
|
||||||
|
chain of `PeerRef::Specific` routing decisions) without redesign. Petgraph
|
||||||
|
is not needed for v1 (one-hop, shallow); it pays off if multi-hop
|
||||||
|
path-finding becomes real (OQ-32).
|
||||||
|
|
||||||
|
**Negative:**
|
||||||
|
- `CompositeOperationEnv` → `PeerCompositeEnv` is a migration. Existing call
|
||||||
|
sites that construct `CompositeOperationEnv::new(base, Some(conn), session)`
|
||||||
|
migrate to `PeerCompositeEnv::new(base).with_session(session).attach_peer(peer_id, conn)`.
|
||||||
|
The singular-connection case (one peer) is the degenerate case
|
||||||
|
(`connections` with one entry).
|
||||||
|
- `OperationEnv` trait gains a method. The default-impl preserves back-compat,
|
||||||
|
but it's a trait surface change; downstream impls (`alknet-http`,
|
||||||
|
`alknet-agent`) gain the method with the default delegation.
|
||||||
|
- `services/list` semantics change: the filter is `AccessControl`-based, not
|
||||||
|
`remote_safe`-based. An op with `AccessControl::default()` (no restrictions)
|
||||||
|
is now listed to any peer — this is correct (it's implicitly callable by
|
||||||
|
any authenticated peer), but operators who relied on `remote_safe: false` to
|
||||||
|
hide ops from peers must instead set `required_scopes` or `Visibility::Internal`.
|
||||||
|
- ADR-028 is superseded. The `remote_safe` field, `trusted_peer` flag,
|
||||||
|
`RemoteFilter`, `list_operations_peer_scoped`, and
|
||||||
|
`services_list_handler_peer_scoped` are removed. Code that references them
|
||||||
|
(the `CallClient`, `Dispatcher`, `HandlerRegistration`, `discovery.rs`)
|
||||||
|
changes. This is the cost of fixing a one-way-door miss — the previous model
|
||||||
|
shipped and was reviewed before the structural gap was caught.
|
||||||
|
- `PeerId = Identity.id` (the fingerprint) is not stable across key rotation.
|
||||||
|
A peer that rotates its TLS key gets a new `PeerId`; in-flight
|
||||||
|
`PeerRef::Specific(old_id)` gets `NOT_FOUND` after reconnect. For the
|
||||||
|
immediate use case (head→workers where the operator controls key rotation),
|
||||||
|
this is acceptable. A stable logical node name decoupled from cryptographic
|
||||||
|
identity is the cleaner long-term shape (assumption 1).
|
||||||
|
|
||||||
|
## Assumptions
|
||||||
|
|
||||||
|
1. **`PeerId = Identity.id` (the fingerprint).** Reconnects with a rotated key
|
||||||
|
change the `PeerId`; the peer-keyed overlay drops the old `PeerId`'s
|
||||||
|
sub-overlay and creates a new one. An in-flight `PeerRef::Specific(old_id)`
|
||||||
|
gets `NOT_FOUND`. This is acceptable for v1 (operator-controlled key
|
||||||
|
rotation in the head→workers pattern). A stable logical node name separate
|
||||||
|
from the cryptographic identity is a future question; the peer-keyed overlay
|
||||||
|
model accommodates it by changing what `PeerId` aliases, not by redesign.
|
||||||
|
|
||||||
|
2. **`PeerRef::Any` = insertion-order first-match.** Deterministic but
|
||||||
|
order-dependent (worker A connects before worker B → `Any` routes to A
|
||||||
|
until A disconnects). This is the simplest routing policy and is correct for
|
||||||
|
the immediate use case (the head picks the first worker that serves the
|
||||||
|
op). A richer `RoutingPolicy` (round-robin, least-loaded, affinity) is OQ-30;
|
||||||
|
the `PeerRef` enum composes with it without breaking the signature.
|
||||||
|
|
||||||
|
3. **`services/list` defaults to "own ops only" (unchanged from today).**
|
||||||
|
Re-exported peer ops are not listed unless the calling peer invokes
|
||||||
|
`services/list-peers` (the opt-in). The re-export policy (which peers' ops a
|
||||||
|
given peer sees) is an `AccessControl` decision on the listing op.
|
||||||
|
|
||||||
|
4. **Capability exposure under `PeerRef::Any`.** When a handler composes via
|
||||||
|
`Any` and routing picks worker A, the handler's `Capabilities` propagate to
|
||||||
|
worker A's call (same as today's `from_call` forwarding). This is correct:
|
||||||
|
the handler declared the op in its scoped env, so it authorized the
|
||||||
|
composition; the peer selection is a routing detail. If a handler needs
|
||||||
|
per-peer capability scoping, it uses `PeerRef::Specific` and peer-pinned
|
||||||
|
reachability.
|
||||||
|
|
||||||
|
5. **Multi-hop federation is out of scope for v1.** Worker A does not
|
||||||
|
transitively see worker B's ops through the head unless the head explicitly
|
||||||
|
re-exports them. The peer-keyed overlay model extends to multi-hop without
|
||||||
|
redesign (a chain of `PeerRef::Specific` decisions), but path-finding
|
||||||
|
(which peer reaches which op transitively) is where petgraph would pay off
|
||||||
|
(OQ-32, not designed).
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- ADR-015: Privilege Model and Authority Context (the authority-switch pattern
|
||||||
|
ADR-028 violated by reintroducing a blanket-bypass flag)
|
||||||
|
- ADR-017: Call Protocol Client and Adapter Contract (amended: `CallClient`
|
||||||
|
no longer has `trusted_peer`; the client/adapter spec updates)
|
||||||
|
- ADR-022: Handler Registration, Provenance, and Composition Authority
|
||||||
|
(`remote_safe` field removed from the registration bundle)
|
||||||
|
- ADR-024: Operation Registry Layering (Layer 2 becomes peer-keyed at the
|
||||||
|
composition-env aggregation level)
|
||||||
|
- ADR-028: Peer-Scoped Registry Filtering for CallClient Inbound Dispatch
|
||||||
|
(superseded)
|
||||||
|
- OQ-25: dissolved (no `remote_safe` marking — `AccessControl` is the policy)
|
||||||
|
- OQ-26: stays (`AdapterError` — a `SamePeerCollision` variant may replace
|
||||||
|
the flat `Conflict` variant)
|
||||||
|
- OQ-27: stays (re-import trigger — unchanged; the overlay is now peer-scoped)
|
||||||
|
- OQ-28: dissolved cross-peer (same name on different peers is fine); stays
|
||||||
|
same-peer
|
||||||
|
- OQ-29: stays (TLS client-auth — orthogonal to the routing model)
|
||||||
|
- OQ-30: `PeerRef::Any` routing policy (new — round-robin/least-loaded)
|
||||||
|
- OQ-31: `services/list-peers` re-export semantics (new)
|
||||||
|
- OQ-32: Multi-hop federation (new — petgraph candidate)
|
||||||
|
- Research: `docs/research/alknet-call-peer-routing/findings.md`
|
||||||
|
- Prior art: Ray.io actors (`ActorHandle` = `PeerRef::Specific`), Dapr service
|
||||||
|
invocation (app-ID routing = `PeerRef::Specific`, access-control allowlist =
|
||||||
|
`AccessControl`-based peer authorization)
|
||||||
@@ -319,41 +319,32 @@ These questions are acknowledged but not active. They will be promoted to open w
|
|||||||
|
|
||||||
## Theme: Call Client and Adapters
|
## Theme: Call Client and Adapters
|
||||||
|
|
||||||
These open questions are the two-way-door remainders from the
|
These open questions are the remainders from the call-completion gap analysis
|
||||||
call-completion gap analysis
|
(`docs/research/alknet-call-completion/gap-analysis.md`, DC-1..4) and the
|
||||||
(`docs/research/alknet-call-completion/gap-analysis.md`, DC-1..4). The
|
peer-graph routing research (`docs/research/alknet-call-peer-routing/findings.md`).
|
||||||
one-way door among them (DC-1, the *existence* of peer-scoped filtering as
|
ADR-029 supersedes ADR-028 and dissolves OQ-25 and the cross-peer half of
|
||||||
the default) is resolved by ADR-028; what remains open here is the shape.
|
OQ-28; the remaining two-way-door shape/defaults are recorded in
|
||||||
The v1 defaults for DC-2/3/4 are recorded in
|
|
||||||
[client-and-adapters.md](crates/call/client-and-adapters.md) and may be
|
[client-and-adapters.md](crates/call/client-and-adapters.md) and may be
|
||||||
revisited during implementation without a new ADR.
|
revisited during implementation without a new ADR.
|
||||||
|
|
||||||
### OQ-25: Remote-Safe Marking Shape for CallClient Peer-Scoped Filtering
|
### OQ-25: ~~Remote-Safe Marking Shape for CallClient Peer-Scoped Filtering~~ (Dissolved by ADR-029)
|
||||||
|
|
||||||
- **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 (§1 Consequences), ADR-028
|
- **Origin**: [client-and-adapters.md](crates/call/client-and-adapters.md), ADR-017 (§1 Consequences), ADR-028
|
||||||
- **Status**: open
|
- **Status**: **dissolved** (ADR-029)
|
||||||
- **Door type**: Two-way (shape only — existence is one-way, resolved by ADR-028)
|
- **Door type**: ~~Two-way (shape only — existence is one-way, resolved by ADR-028)~~
|
||||||
- **Priority**: medium
|
- **Priority**: ~~medium~~
|
||||||
- **Resolution**: ADR-028 locks the one-way door: a `CallClient`'s registry
|
- **Resolution**: **Dissolved by [ADR-029](decisions/029-peer-graph-routing-model.md).**
|
||||||
view is **default-deny** (no operation is exposed to the remote peer unless
|
ADR-028's `remote_safe: bool` / `trusted_peer` model is superseded — it was a
|
||||||
explicitly marked remote-safe), with share-global as an explicit trusted-peer
|
parallel, weaker authorization system that duplicated the existing
|
||||||
opt-in. The v1 shape is a `remote_safe: bool` field on
|
`AccessControl`/`Identity` machinery. ADR-029 retires `remote_safe`/
|
||||||
`HandlerRegistration` (default `false` across all provenance). The shape is
|
`trusted_peer` entirely; peer authorization flows through
|
||||||
the two-way-door remainder: a boolean is the simplest shape that supports
|
`AccessControl::check(peer_identity)`. The op's `AccessControl` *is* the
|
||||||
default-deny; a deployment that needs per-peer differentiation (different
|
peer-authorization policy — there is no separate marking. Per-peer
|
||||||
subsets exposed to different peers on the same node) needs a richer
|
differentiation is via `IdentityProvider` config (different peers get
|
||||||
mechanism — per-peer allowlist, capability-class tag, or a peer-id-keyed map
|
different scopes), not a per-op boolean. The "shape" question is moot
|
||||||
on the registration. v1's boolean limits this to "remote-safe for any peer"
|
because there is no marking to shape. See ADR-029 §3.
|
||||||
vs "not", which is acceptable for the runner/dispatch pattern (one remote
|
|
||||||
peer per `CallClient`). A future ADR may amend or supersede ADR-028's shape
|
|
||||||
without revisiting the *existence* of filtering. Also open under this OQ:
|
|
||||||
whether a richer shape should *expose-but-deny* non-remote-safe ops in
|
|
||||||
`services/list` (returning `NOT_FOUND` on call) instead of *hiding* them.
|
|
||||||
v1 hides them — a peer should not see ops it cannot call, so discovery and
|
|
||||||
dispatch filters agree (ADR-028 Assumption 2); expose-but-deny is the
|
|
||||||
richer-shape question, not a v1 question.
|
|
||||||
- **Cross-references**: ADR-009, ADR-014, ADR-015, ADR-017, ADR-022, ADR-024,
|
- **Cross-references**: ADR-009, ADR-014, ADR-015, ADR-017, ADR-022, ADR-024,
|
||||||
ADR-028, [client-and-adapters.md](crates/call/client-and-adapters.md),
|
~~ADR-028~~ (superseded), ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md),
|
||||||
[operation-registry.md](crates/call/operation-registry.md)
|
[operation-registry.md](crates/call/operation-registry.md)
|
||||||
|
|
||||||
### OQ-26: OperationAdapter Error Type (AdapterError Variants)
|
### OQ-26: OperationAdapter Error Type (AdapterError Variants)
|
||||||
@@ -408,7 +399,16 @@ revisited during implementation without a new ADR.
|
|||||||
no ADR needed. The alternative (last-wins) would silently mask one
|
no ADR needed. The alternative (last-wins) would silently mask one
|
||||||
remote's op behind another's, which is the kind of surprise the
|
remote's op behind another's, which is the kind of surprise the
|
||||||
default-deny posture exists to avoid.
|
default-deny posture exists to avoid.
|
||||||
- **Cross-references**: ADR-015, ADR-017, ADR-028, [client-and-adapters.md](crates/call/client-and-adapters.md)
|
|
||||||
|
**Cross-peer collision dissolved by ADR-029.** Under the peer-keyed overlay
|
||||||
|
model, same name on different peers is fine — they live in separate
|
||||||
|
peer sub-overlays, no collision, no prefix needed. The collision rule now
|
||||||
|
stays only *within* a peer (same name on the same peer is still an error —
|
||||||
|
a peer shouldn't expose two ops with the same name). `FromCallConfig::namespace_prefix`
|
||||||
|
becomes optional local-naming sugar, not the disambiguation mechanism. See
|
||||||
|
ADR-029 §5.
|
||||||
|
- **Cross-references**: ADR-015, ADR-017, ~~ADR-028~~ (superseded), ADR-029,
|
||||||
|
[client-and-adapters.md](crates/call/client-and-adapters.md)
|
||||||
|
|
||||||
### OQ-29: CallClient TLS Client-Auth and Remote-Identity Verification
|
### OQ-29: CallClient TLS Client-Auth and Remote-Identity Verification
|
||||||
|
|
||||||
@@ -432,4 +432,57 @@ revisited during implementation without a new ADR.
|
|||||||
call-protocol `auth_token` payload field, not TLS, so the no-env-vars
|
call-protocol `auth_token` payload field, not TLS, so the no-env-vars
|
||||||
invariant holds independently of this gap. Decided during a future task that
|
invariant holds independently of this gap. Decided during a future task that
|
||||||
wires RawKey client-auth; recorded here, not in a full ADR.
|
wires RawKey client-auth; recorded here, not in a full ADR.
|
||||||
- **Cross-references**: ADR-014, ADR-017, ADR-027, [client-and-adapters.md](crates/call/client-and-adapters.md), [endpoint.md](crates/core/endpoint.md)
|
- **Cross-references**: ADR-014, ADR-017, ADR-027, [client-and-adapters.md](crates/call/client-and-adapters.md), [endpoint.md](crates/core/endpoint.md)
|
||||||
|
|
||||||
|
### OQ-30: PeerRef::Any Routing Policy
|
||||||
|
|
||||||
|
- **Origin**: [ADR-029](decisions/029-peer-graph-routing-model.md) §2, [client-and-adapters.md](crates/call/client-and-adapters.md), `docs/research/alknet-call-peer-routing/findings.md` §3.2
|
||||||
|
- **Status**: open
|
||||||
|
- **Door type**: Two-way
|
||||||
|
- **Priority**: low
|
||||||
|
- **Resolution**: v1 `PeerRef::Any` uses insertion-order first-match —
|
||||||
|
deterministic but order-dependent (worker A connects before worker B → `Any`
|
||||||
|
routes to A until A disconnects). This is the simplest routing policy and is
|
||||||
|
correct for the immediate use case (the head picks the first worker that
|
||||||
|
serves the op). A richer `RoutingPolicy` (round-robin, least-loaded,
|
||||||
|
affinity) is the two-way-door remainder; the `PeerRef` enum is designed to
|
||||||
|
compose with a `Route { selector, policy }` struct without breaking the
|
||||||
|
`invoke_peer` signature. Decided during implementation when a fan-out use
|
||||||
|
case needs it; recorded here, not in a full ADR.
|
||||||
|
- **Cross-references**: ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md)
|
||||||
|
|
||||||
|
### OQ-31: services/list-peers Re-Export Semantics
|
||||||
|
|
||||||
|
- **Origin**: [ADR-029](decisions/029-peer-graph-routing-model.md) §6, `docs/research/alknet-call-peer-routing/findings.md` §3.5
|
||||||
|
- **Status**: open
|
||||||
|
- **Door type**: Two-way
|
||||||
|
- **Priority**: low
|
||||||
|
- **Resolution**: v1 defaults to "own ops only" — `services/list` shows the
|
||||||
|
head's own Layer 0 `External` ops, filtered by `AccessControl::check(calling_peer)`,
|
||||||
|
unchanged from today (minus the `remote_safe` filter). A `services/list-peers`
|
||||||
|
opt-in (new built-in operation) lists the peer overlays with attribution:
|
||||||
|
each peer's sub-overlay listed as `{ peer: Option<PeerId>, operations: [...] }`,
|
||||||
|
filtered by the calling peer's authorization. Whether re-exported peer ops
|
||||||
|
are listed by default, opt-in, or per-peer-policy is the two-way-door
|
||||||
|
remainder; v1 is opt-in (`services/list-peers`). The re-export policy is an
|
||||||
|
`AccessControl` decision on the listing op. Decided during implementation
|
||||||
|
when a consumer needs peer-attributed discovery; recorded here, not in a
|
||||||
|
full ADR.
|
||||||
|
- **Cross-references**: ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md)
|
||||||
|
|
||||||
|
### OQ-32: Multi-Hop Federation
|
||||||
|
|
||||||
|
- **Origin**: [ADR-029](decisions/029-peer-graph-routing-model.md) §3.7, `docs/research/alknet-call-peer-routing/findings.md` §3.7
|
||||||
|
- **Status**: open
|
||||||
|
- **Door type**: One-way (federation model), two-way (mechanism)
|
||||||
|
- **Priority**: low
|
||||||
|
- **Resolution**: v1 is one-hop — worker A does not transitively see worker
|
||||||
|
B's ops through the head unless the head explicitly re-exports them. The
|
||||||
|
peer-keyed overlay model extends to multi-hop without redesign (a chain of
|
||||||
|
`PeerRef::Specific` routing decisions), but path-finding (which peer reaches
|
||||||
|
which op transitively) is where a graph library (petgraph) would pay off.
|
||||||
|
For v1 (one hop, shallow), a nested `HashMap<PeerId, HashMap<String, ...>>`
|
||||||
|
suffices. Whether multi-hop federation becomes a real use case is a future
|
||||||
|
decision; the peer-keyed model does not foreclose it. Not designed; tracked
|
||||||
|
here so the v1 model's extendability is recorded.
|
||||||
|
- **Cross-references**: ADR-029, [client-and-adapters.md](crates/call/client-and-adapters.md)
|
||||||
803
docs/research/alknet-call-peer-routing/findings.md
Normal file
803
docs/research/alknet-call-peer-routing/findings.md
Normal file
@@ -0,0 +1,803 @@
|
|||||||
|
# Research: Peer-Graph Routing Model for alknet-call Composition
|
||||||
|
|
||||||
|
**Status**: Complete
|
||||||
|
**Date**: 2026-06-27
|
||||||
|
**Scope**: Deep dive — structural design fix, POC-validated
|
||||||
|
**Supersedes**: ADR-028 (to be superseded by a new ADR; draft included in §11)
|
||||||
|
**POC**: Validated in-repo against real types, then removed. See §7.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Problem Statement
|
||||||
|
|
||||||
|
The call protocol's composition model is **flat per overlay and single-peer**.
|
||||||
|
This works for one remote peer and breaks the moment a head node has two
|
||||||
|
workers. The breakage is structural, not a missing default:
|
||||||
|
|
||||||
|
1. **Overlay collision.** `CompositeOperationEnv` holds **one** `connection:
|
||||||
|
Option<Arc<dyn OperationEnv>>` overlay (`registry/env.rs:96-100`). The
|
||||||
|
Layer 2 imported-ops overlay on `CallConnection` is a flat
|
||||||
|
`HashMap<String, HandlerRegistration>` keyed by operation name
|
||||||
|
(`protocol/connection.rs:36`). When a head imports from worker A and
|
||||||
|
worker B, both exposing `/container/exec`, there is no way to route
|
||||||
|
`invoke("container", "exec")` to the right peer. `from_call` against A
|
||||||
|
and B both register `container/exec` into their respective connection
|
||||||
|
overlays, but the composite env can hold only one connection layer — and
|
||||||
|
even if it held two, `contains("container/exec")` returns true for both
|
||||||
|
with no way to disambiguate.
|
||||||
|
|
||||||
|
2. **`from_call` namespace prefix is a naming-convention hack.** DC-3 / OQ-28
|
||||||
|
made `FromCallConfig::namespace_prefix` the disambiguation mechanism: the
|
||||||
|
operator prefixes imported op names (`worker-a/container/exec`) so two
|
||||||
|
peers' ops don't collide in a flat map. This pushes disambiguation to the
|
||||||
|
caller and into the `ScopedOperationEnv { allowed: HashSet<String> }`
|
||||||
|
reachability list — every composing handler that wants to reach
|
||||||
|
worker A's `container/exec` must list `"worker-a/container/exec"` in its
|
||||||
|
scoped env. The prefix is bolted onto a flat map instead of being
|
||||||
|
structural routing.
|
||||||
|
|
||||||
|
3. **ADR-028's `remote_safe: bool` + `trusted_peer: bool` is a second,
|
||||||
|
parallel, weaker authorization system.** ADR-028 introduced a
|
||||||
|
`RemoteFilter { trusted_peer: bool }` gate in `protocol/dispatch.rs:48-70`
|
||||||
|
that runs *before* the existing `AccessControl::check`
|
||||||
|
(`registry/registration.rs:128-140`). `trusted_peer: true` is a blanket
|
||||||
|
security-bypass flag — the exact anti-pattern ADR-015 was written to kill
|
||||||
|
(it replaced `trusted: true` with the authority-switch model). ADR-028
|
||||||
|
reintroduced it at the peer boundary. The existing authorization
|
||||||
|
machinery in core (`Identity`, `IdentityProvider`, `AccessControl::check`)
|
||||||
|
is real, grounded, and already wired into the dispatch path — ADR-028
|
||||||
|
should have *used* it for peer authorization, not invented a parallel
|
||||||
|
system.
|
||||||
|
|
||||||
|
The head→many-workers / hub→spoke pattern (ray.io's model) is the primary
|
||||||
|
downstream use case. The current model cannot express it. This is a blocking
|
||||||
|
structural fix, not a "v1/later" refinement.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. The Existing Authorization Machinery (What ADR-028 Should Have Used)
|
||||||
|
|
||||||
|
The dispatch path already runs `AccessControl::check` against the caller's
|
||||||
|
`Identity`. For a remote peer's call, the caller's `Identity` *is* the peer's
|
||||||
|
resolved identity. The machinery is complete:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// crates/alknet-core/src/auth.rs:14-19
|
||||||
|
pub struct Identity {
|
||||||
|
pub id: String, // the peer's fingerprint/id
|
||||||
|
pub scopes: Vec<String>, // what this peer is allowed to do
|
||||||
|
pub resources: HashMap<String, Vec<String>>, // resource-scoped grants
|
||||||
|
}
|
||||||
|
|
||||||
|
// crates/alknet-call/src/registry/spec.rs:31-37
|
||||||
|
pub struct AccessControl {
|
||||||
|
pub required_scopes: Vec<String>, // AND-gate
|
||||||
|
pub required_scopes_any: Option<Vec<String>>, // OR-gate
|
||||||
|
pub resource_type: Option<String>,
|
||||||
|
pub resource_action: Option<String>,
|
||||||
|
}
|
||||||
|
impl AccessControl { pub fn check(&self, identity: Option<&Identity>) -> AccessResult }
|
||||||
|
```
|
||||||
|
|
||||||
|
The dispatch path (`registry/registration.rs:112-144`) already does the right
|
||||||
|
thing:
|
||||||
|
|
||||||
|
- For **external** (wire) calls: ACL checks against `context.identity` — the
|
||||||
|
caller's identity, which for a peer call is the peer's `Identity` resolved
|
||||||
|
via `Dispatcher::resolve_identity` (`protocol/dispatch.rs:116-134`) from the
|
||||||
|
connection's TLS fingerprint or the call-protocol `auth_token` payload.
|
||||||
|
- For **internal** (composition) calls: ACL checks against
|
||||||
|
`context.handler_identity` (the `CompositionAuthority` synthesized as
|
||||||
|
`Identity`).
|
||||||
|
|
||||||
|
`Connection::identity()` (`crates/alknet-core/src/types.rs:486`) already
|
||||||
|
returns `Option<&Identity>` — the peer's resolved identity, set via
|
||||||
|
`Connection::set_identity`. `dispatch_requested` already reads it
|
||||||
|
(`protocol/dispatch.rs:222`). **The peer's `Identity` is already in the
|
||||||
|
dispatch path.** ADR-028's `remote_safe` gate is a parallel gate bolted on
|
||||||
|
*before* this existing check runs.
|
||||||
|
|
||||||
|
The security argument ADR-028 was trying to make — "a remote peer's call must
|
||||||
|
not populate `OperationContext.capabilities` from the local bundle unless the
|
||||||
|
op is explicitly exposed" — is already enforced by `AccessControl`: an op
|
||||||
|
whose `AccessControl` requires a scope the peer doesn't have returns
|
||||||
|
`FORBIDDEN` before the handler runs, so capabilities are never populated. An
|
||||||
|
op with `AccessControl::default()` (no restrictions) is implicitly callable
|
||||||
|
by any peer — including a remote one — because it requires no privileged
|
||||||
|
scope. An op that should never be callable from the wire uses
|
||||||
|
`Visibility::Internal`, which returns `NOT_FOUND` before ACL even runs (the
|
||||||
|
existing behavior, `registration.rs:124-126`).
|
||||||
|
|
||||||
|
**The op's `AccessControl` *is* the peer-authorization policy.** There is no
|
||||||
|
need for a separate `remote_safe` flag or `trusted_peer` bypass.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Proposed Design
|
||||||
|
|
||||||
|
### 3.1 Peer-keyed overlays (research question 2)
|
||||||
|
|
||||||
|
The Layer 2 overlay becomes peer-keyed. Two shapes change:
|
||||||
|
|
||||||
|
**`CallConnection`'s overlay** — currently
|
||||||
|
`imported_operations: Arc<RwLock<HashMap<String, HandlerRegistration>>>`
|
||||||
|
(`protocol/connection.rs:36`). Under the peer model, the *head node* (which
|
||||||
|
holds many connections) needs a peer-keyed overlay across all its connections.
|
||||||
|
The per-`CallConnection` overlay stays flat (one connection = one peer), but
|
||||||
|
the *composition env* that aggregates multiple connections becomes peer-keyed:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// The per-connection overlay stays flat — one connection, one peer.
|
||||||
|
// CallConnection::imported_operations: HashMap<String, HandlerRegistration> (unchanged)
|
||||||
|
|
||||||
|
// The composite env becomes peer-keyed. This replaces
|
||||||
|
// CompositeOperationEnv's singular `connection: Option<Arc<dyn OperationEnv>>`.
|
||||||
|
pub struct PeerCompositeEnv {
|
||||||
|
pub base: Arc<dyn OperationEnv + Send + Sync>, // Layer 0 curated
|
||||||
|
pub session: Option<Arc<dyn OperationEnv + Send + Sync>>, // Layer 1
|
||||||
|
pub connections: HashMap<PeerId, Arc<dyn OperationEnv + Send + Sync>>, // Layer 2, peer-keyed
|
||||||
|
connection_order: Vec<PeerId>, // insertion order for PeerRef::Any first-match
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `PeerId` is the peer's `Identity.id` — the same field
|
||||||
|
`Connection::identity()` already exposes. This is the natural key: it's
|
||||||
|
already resolved, already in the dispatch path, and already unique per peer.
|
||||||
|
|
||||||
|
**`contains()` across multiple peer overlays** — the composite env's
|
||||||
|
`contains(name)` returns true if *any* peer's overlay contains the name (the
|
||||||
|
union). This is the probe the fallthrough logic uses. A peer-qualified
|
||||||
|
`peer_contains(peer, name)` is added for `PeerRef::Specific` routing.
|
||||||
|
|
||||||
|
### 3.2 `OperationEnv::invoke()` peer-routing signature (research question 1)
|
||||||
|
|
||||||
|
A `PeerRef` enum is added as the peer selector on the routing path:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub enum PeerRef {
|
||||||
|
Specific(PeerId), // route to this exact peer; NOT_FOUND if it doesn't serve the op
|
||||||
|
Any, // route to the first peer (insertion order) that serves it
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `OperationEnv` trait gains a peer-routing method. Two integration options
|
||||||
|
(validated in the POC, §7):
|
||||||
|
|
||||||
|
**Option A — extend `OperationEnv` with a default-impl method:**
|
||||||
|
```rust
|
||||||
|
#[async_trait::async_trait]
|
||||||
|
pub trait OperationEnv: Send + Sync {
|
||||||
|
// existing methods unchanged
|
||||||
|
async fn invoke_with_policy(&self, namespace: &str, operation: &str,
|
||||||
|
input: Value, parent: &OperationContext, policy: AbortPolicy) -> ResponseEnvelope;
|
||||||
|
fn contains(&self, _name: &str) -> bool { true }
|
||||||
|
|
||||||
|
// new peer-routing method, default-impl delegates to invoke_with_policy
|
||||||
|
// (back-compat: existing impls that don't override it route to "any" /
|
||||||
|
// the single connection, preserving current behavior).
|
||||||
|
async fn invoke_peer(&self, peer: &PeerRef, namespace: &str, operation: &str,
|
||||||
|
input: Value, parent: &OperationContext, policy: AbortPolicy) -> ResponseEnvelope {
|
||||||
|
// default: ignore peer selector, dispatch via invoke_with_policy
|
||||||
|
self.invoke_with_policy(namespace, operation, input, parent, policy).await
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Option B — make `PeerRef` an optional parameter on `invoke_with_policy`.**
|
||||||
|
Heavier change; breaks all impls. Rejected for v1.
|
||||||
|
|
||||||
|
**Recommendation: Option A.** The default-impl method preserves back-compat
|
||||||
|
(existing `LocalOperationEnv`, `OverlayOperationEnv` work unchanged) and lets
|
||||||
|
`PeerCompositeEnv` override it with real peer routing. The existing
|
||||||
|
`invoke()` / `invoke_with_policy()` methods stay as the `PeerRef::Any`
|
||||||
|
equivalent for code that doesn't care about peer selection.
|
||||||
|
|
||||||
|
**Why `PeerRef` over the alternatives:**
|
||||||
|
|
||||||
|
| Alternative | Verdict |
|
||||||
|
|---|---|
|
||||||
|
| Peer-id string parameter | Rejected — too loose. No "any peer that serves this name" semantics; forces the caller to always pick a peer even when it doesn't care. |
|
||||||
|
| Encode peer into namespace (`"worker-a/container/exec"`) | Rejected — this is the flat-namespace-prefix hack (DC-3/OQ-28) the research exists to replace. Pushes disambiguation into naming conventions rather than structural routing. |
|
||||||
|
| `Route` struct carrying selector + policy | Deferred to v2. v1's `PeerRef` + insertion-order `Any` is the minimal shape. A `Route { selector, policy: RoutingPolicy }` (round-robin, least-loaded) is the natural extension and composes cleanly with `PeerRef`. |
|
||||||
|
|
||||||
|
### 3.3 Retiring `remote_safe` / `trusted_peer` (research question 3)
|
||||||
|
|
||||||
|
`RemoteFilter` (`protocol/dispatch.rs:48-70`), `HandlerRegistration::remote_safe`
|
||||||
|
(`registry/registration.rs:41`), `CallClient::trusted_peer`
|
||||||
|
(`client/call_client.rs:99`), `OperationRegistry::list_operations_peer_scoped`
|
||||||
|
(`registry/registration.rs:103`), and
|
||||||
|
`services_list_handler_peer_scoped` (`registry/discovery.rs:202`) are all
|
||||||
|
**removed**. Peer authorization flows through the existing `AccessControl::check`:
|
||||||
|
|
||||||
|
- A remote peer's call arrives → `dispatch_requested` resolves the peer's
|
||||||
|
`Identity` (already does, `dispatch.rs:222-223`) → `OperationRegistry::invoke`
|
||||||
|
runs `AccessControl::check(peer_identity)` (`registration.rs:128-140`).
|
||||||
|
- If the op's `AccessControl` is satisfied → dispatch (capabilities populated
|
||||||
|
from the bundle, same as today).
|
||||||
|
- If not → `FORBIDDEN` (capabilities never populated — the security property
|
||||||
|
ADR-028 wanted, achieved by the existing ACL, not a parallel gate).
|
||||||
|
- If the op is `Visibility::Internal` → `NOT_FOUND` before ACL (existing
|
||||||
|
behavior, `registration.rs:124-126`). This is the "never callable from wire"
|
||||||
|
case — `Internal` is the existing mechanism for it.
|
||||||
|
|
||||||
|
**Does this fully replace `remote_safe`?** Yes. The three cases `remote_safe`
|
||||||
|
was meant to handle map to existing mechanisms:
|
||||||
|
|
||||||
|
| `remote_safe` case | Replacement |
|
||||||
|
|---|---|
|
||||||
|
| Op callable by any peer (was `remote_safe: true`) | `AccessControl::default()` — no restrictions, any authenticated (or unauthenticated) peer may call. Implicitly "remote-safe" because it requires no privileged scope. |
|
||||||
|
| Op callable only by some peers | `AccessControl { required_scopes: [...] }` — only peers whose `Identity.scopes` satisfy the AND-gate may call. Per-peer differentiation via `IdentityProvider` config (different peers get different scopes). |
|
||||||
|
| Op never callable from wire | `Visibility::Internal` — `NOT_FOUND` before ACL. Existing mechanism, unchanged. |
|
||||||
|
|
||||||
|
**The capability-exposure concern (ADR-028 Context).** ADR-028's worry was
|
||||||
|
"a remote peer's call must not populate `OperationContext.capabilities` from
|
||||||
|
the local bundle unless the op is explicitly exposed." Under the `AccessControl`
|
||||||
|
model, "the op is callable by this peer" *is* "the op is exposed to this
|
||||||
|
peer" — there is no separate exposure decision. If the peer's `Identity`
|
||||||
|
satisfies the op's `AccessControl`, the op dispatches and capabilities
|
||||||
|
populate (same as for any authorized caller). If not, `FORBIDDEN` before the
|
||||||
|
handler — capabilities never populate. The exposure decision and the
|
||||||
|
authorization decision are the same decision, made through one mechanism
|
||||||
|
(`AccessControl`), not two (`AccessControl` + `remote_safe`).
|
||||||
|
|
||||||
|
The one residual concern: an op with `AccessControl::default()` (no
|
||||||
|
restrictions) is callable by *any* peer, including an unauthenticated one.
|
||||||
|
This is correct — an op that requires no privileged scope is implicitly
|
||||||
|
safe to expose. If the operator wants to restrict it, they set
|
||||||
|
`required_scopes`. This is the same posture as every other ACL-gated system:
|
||||||
|
default-open for unrestricted ops, default-closed for privileged ops, and
|
||||||
|
`Internal` for never-wire-callable ops.
|
||||||
|
|
||||||
|
### 3.4 `ScopedOperationEnv` under the peer model (research question 1, cont.)
|
||||||
|
|
||||||
|
The current `ScopedOperationEnv { allowed: HashSet<String> }`
|
||||||
|
(`registry/context.rs:67-88`) enumerates flat op names. Under the peer model,
|
||||||
|
reachability may need to be peer-qualified: a handler may reach
|
||||||
|
`"worker-a/container/exec"` but not `"worker-b/container/exec"`.
|
||||||
|
|
||||||
|
**v1 design: keep `ScopedOperationEnv` as-is for the *unqualified* reachability
|
||||||
|
(the common case — peer-agnostic composition), add an *optional* peer-pinned
|
||||||
|
allowlist for the case where a handler must be pinned to a specific peer:**
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct ScopedPeerEnv {
|
||||||
|
/// Unqualified — op names reachable from any peer (or locally).
|
||||||
|
/// A handler with "container/exec" here may compose it via PeerRef::Any
|
||||||
|
/// or PeerRef::Specific(any-peer-that-serves-it).
|
||||||
|
pub allowed_ops: HashSet<String>,
|
||||||
|
/// Peer-pinned — "peer-id/op-name" entries. A handler with
|
||||||
|
/// "worker-a/container/exec" here may compose it via
|
||||||
|
/// PeerRef::Specific("worker-a") but NOT via PeerRef::Specific("worker-b")
|
||||||
|
/// even if worker-b also serves container/exec.
|
||||||
|
pub peer_pinned: HashSet<String>,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This keeps the common case (peer-agnostic composition: "I want to call
|
||||||
|
`container/exec` on whichever worker serves it") simple — just list the op
|
||||||
|
name in `allowed_ops`. Peer-pinning is opt-in for the disambiguation case
|
||||||
|
that replaces `FromCallConfig::namespace_prefix` (OQ-28): instead of prefixing
|
||||||
|
the *op name*, you pin the *peer* in the reachability set.
|
||||||
|
|
||||||
|
**Integration with the existing `ScopedOperationEnv`:** the POC validates
|
||||||
|
that `ScopedPeerEnv` composes with the existing `ScopedOperationEnv` — the
|
||||||
|
unqualified `allowed_ops` is the same shape as `ScopedOperationEnv.allowed`,
|
||||||
|
and the peer-pinned set is additive. The migration path is: existing
|
||||||
|
`ScopedOperationEnv` becomes the `allowed_ops` field; peer-pinning is a new
|
||||||
|
opt-in field.
|
||||||
|
|
||||||
|
### 3.5 `services/list` across a peer graph (research question 4)
|
||||||
|
|
||||||
|
When worker A calls `services/list` on a head that has re-exported worker B's
|
||||||
|
ops, worker A sees:
|
||||||
|
|
||||||
|
- **v1 default**: the head's own Layer 0 `External` ops, filtered to those
|
||||||
|
worker A is authorized to call (`AccessControl::check(worker_a_identity)`).
|
||||||
|
Unchanged from today's `services_list_handler` (`registry/discovery.rs:175`),
|
||||||
|
except the filter is `AccessControl`-based, not `remote_safe`-based.
|
||||||
|
- **Re-export listing** (new, opt-in): a `services/list-peers` op (or a
|
||||||
|
`?include_peers=true` flag) lists the peer overlays with attribution. Each
|
||||||
|
peer's sub-overlay is listed as a `PeerServiceListing { peer: Option<PeerId>,
|
||||||
|
operations: Vec<PeerOpSummary> }`. The listing is filtered by the calling
|
||||||
|
peer's `Identity` — a peer sees re-exported ops only if it is authorized to
|
||||||
|
call them (the listing op's own `AccessControl` gates who may call
|
||||||
|
`services/list-peers`, and the listed ops' `AccessControl` determines
|
||||||
|
whether the calling peer could actually dispatch them).
|
||||||
|
|
||||||
|
The `services_list_handler` / `services_list_handler_peer_scoped` split
|
||||||
|
(`registry/discovery.rs:175-224`) collapses to a single `AccessControl`-filtered
|
||||||
|
handler. The `peer_scoped` variant (which took `trusted_peer: bool`) is removed;
|
||||||
|
the filtering is done by `AccessControl::check(calling_peer_identity)` inside
|
||||||
|
the handler, same as every other op.
|
||||||
|
|
||||||
|
### 3.6 `from_call` under the peer model (research question 5)
|
||||||
|
|
||||||
|
`from_call` (`client/from_call.rs:68-108`) discovers the remote peer's ops and
|
||||||
|
registers them. Under peer-keyed overlays, the registration target is the
|
||||||
|
*specific peer's* sub-overlay, not a flat overlay:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Before (flat): connection.register_imported(reg) — into the connection's flat overlay
|
||||||
|
// After (peer-keyed): peer_overlay.register_imported(peer_id, reg) — into the peer's sub-overlay
|
||||||
|
```
|
||||||
|
|
||||||
|
**Collision behavior (OQ-28) dissolves across peers.** Same name on different
|
||||||
|
peers is fine — they live in separate sub-overlays, no collision, no prefix
|
||||||
|
needed. The collision rule stays *within* a peer: same name on the *same* peer
|
||||||
|
is still an error (a peer shouldn't expose two ops with the same name). This
|
||||||
|
is the `SamePeerCollision` error in the POC.
|
||||||
|
|
||||||
|
**`FromCallConfig::namespace_prefix` becomes optional sugar** for the case
|
||||||
|
where the *importing* node wants to expose a peer's ops under a different name
|
||||||
|
*locally* (e.g., import worker-a's `container/exec` as `worker-a/container/exec`
|
||||||
|
in the local Layer 0 for composition by handlers that use the flat
|
||||||
|
`ScopedOperationEnv`). This is a local-naming concern, not a disambiguation
|
||||||
|
concern — the peer-keyed overlay already disambiguates by peer. The prefix is
|
||||||
|
only for the local-naming-sugar case and defaults to `None`.
|
||||||
|
|
||||||
|
### 3.7 Multi-hop federation (research question 6 — out of scope for v1)
|
||||||
|
|
||||||
|
If worker A imports from the head, and the head imports from worker B, does
|
||||||
|
worker A transitively see worker B's ops? **v1: no.** The peer-keyed overlay
|
||||||
|
model is one-hop. A handler on the head can compose worker B's ops (they're in
|
||||||
|
the head's peer-keyed overlay), but worker A does not transitively see them
|
||||||
|
unless the head explicitly re-exports them (the `services/list-peers` opt-in
|
||||||
|
above).
|
||||||
|
|
||||||
|
**Does the peer-keyed model foreclose multi-hop?** No — it extends naturally.
|
||||||
|
The `PeerCompositeEnv.connections: HashMap<PeerId, Arc<dyn OperationEnv>>`
|
||||||
|
already keys by `PeerId`; a multi-hop path is a chain of `PeerRef::Specific`
|
||||||
|
routing decisions. The question is whether path-finding (which peer reaches
|
||||||
|
which op transitively) becomes real, which is where petgraph would pay off.
|
||||||
|
For v1 (one hop, shallow), a nested `HashMap<PeerId, HashMap<String, ...>>`
|
||||||
|
suffices. **Petgraph is not needed for v1.** It pays off if/when multi-hop
|
||||||
|
federation with path-finding becomes a real use case — the peer-keyed overlay
|
||||||
|
model extends to it without redesign, by adding a path-finding layer over the
|
||||||
|
peer-keyed map. This is noted, not designed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Prior Art Analysis
|
||||||
|
|
||||||
|
### 4.1 Ray.io (https://docs.ray.io/en/latest/ray-core/actors.html)
|
||||||
|
|
||||||
|
Ray's model is the head→many-workers pattern this research targets. Key
|
||||||
|
prior art:
|
||||||
|
|
||||||
|
- **`ray.remote(Class)` / `@ray.remote`** — decorates a class as an *actor*
|
||||||
|
(stateful worker). Instantiating `Counter.remote()` creates a new worker
|
||||||
|
and returns an `ActorHandle`. This is the `PeerRef::Specific` analog — the
|
||||||
|
handle *is* the peer reference; calling `counter.increment.remote()` routes
|
||||||
|
to that specific actor.
|
||||||
|
- **Named actors** — Ray supports named actorsors (`Counter.options(name="my-counter").remote()`)
|
||||||
|
addressable by name. This is the `PeerRef::Specific(peer_id)` case where
|
||||||
|
`peer_id` is a human-readable name.
|
||||||
|
- **`ray.get(obj_ref)`** — retrieves results by object reference, decoupling
|
||||||
|
invocation from result retrieval. alknet-call's `ResponseEnvelope` is the
|
||||||
|
direct-return analog (no separate object store).
|
||||||
|
- **Scheduling** — Ray chooses a node for each actor based on resource
|
||||||
|
requirements and scheduling strategy. alknet-call's `PeerRef::Any`
|
||||||
|
(insertion-order first-match) is the v1 analog; a richer `RoutingPolicy`
|
||||||
|
(round-robin, least-loaded) is the future extension.
|
||||||
|
- **No ACL model.** Ray assumes a trusted cluster (all workers under single
|
||||||
|
administrative control). alknet-call's `AccessControl`-based peer
|
||||||
|
authorization is *stronger* than Ray's model — it handles semi-trusted peers
|
||||||
|
(the runner/dispatch pattern ADR-028 was concerned about) via scopes, not a
|
||||||
|
blanket trust flag.
|
||||||
|
|
||||||
|
**Takeaway:** Ray's `ActorHandle` is the `PeerRef::Specific` analog. Ray has
|
||||||
|
no "any worker" primitive at the API level (you always address a specific
|
||||||
|
actor handle); alknet-call's `PeerRef::Any` is an addition for the
|
||||||
|
fan-out-to-any-worker case. Ray's lack of an ACL model is a gap alknet-call
|
||||||
|
fills with `AccessControl`.
|
||||||
|
|
||||||
|
### 4.2 Dapr service invocation (https://docs.dapr.io/developing-applications/building-blocks/service-invocation/service-invocation-overview/)
|
||||||
|
|
||||||
|
Dapr's model is the service-mesh analog. Key prior art:
|
||||||
|
|
||||||
|
- **App ID routing.** Dapr routes by `dapr-app-id` — each application has a
|
||||||
|
unique ID, and invocation targets `<app-id>/<method>`. This is the
|
||||||
|
`PeerRef::Specific(app_id)` analog. App ID is unique per *application*, not
|
||||||
|
per instance — multiple instances share an app ID and Dapr load-balances
|
||||||
|
across them (round-robin via mDNS).
|
||||||
|
- **Round-robin load balancing.** Dapr round-robins across instances of the
|
||||||
|
same app ID. This is the `PeerRef::Any` + `RoutingPolicy::RoundRobin` analog
|
||||||
|
— the v1 insertion-order first-match is the simplest policy; round-robin is
|
||||||
|
the natural v2 addition.
|
||||||
|
- **Access control allow lists.** Dapr has an access-control policy
|
||||||
|
("which applications are allowed to call them, what applications are
|
||||||
|
authorized to do") — this is the `AccessControl`-based peer authorization
|
||||||
|
alknet-call already has. Dapr's model is a sidecar-level allowlist;
|
||||||
|
alknet-call's is per-op `AccessControl` on the registration bundle. Same
|
||||||
|
concept, finer granularity.
|
||||||
|
- **Namespace scoping.** Dapr scopes applications to namespaces; calls cross
|
||||||
|
namespaces with explicit namespace qualification. This is the
|
||||||
|
`PeerRef::Specific` + peer-pinned reachability analog.
|
||||||
|
- **mTLS between sidecars.** Dapr's security is at the transport (mTLS between
|
||||||
|
Dapr sidecars). alknet-call's is at the transport (QUIC TLS) *and* the
|
||||||
|
protocol (`auth_token` payload → `Identity` → `AccessControl`). The
|
||||||
|
`AccessControl` layer is the application-level authorization Dapr's
|
||||||
|
allowlist provides.
|
||||||
|
|
||||||
|
**Takeaway:** Dapr's app-ID routing confirms `PeerRef::Specific(PeerId)` is
|
||||||
|
the right shape — `PeerId` is the app-ID analog. Dapr's round-robin confirms
|
||||||
|
`PeerRef::Any` + a routing policy is the right fan-out shape. Dapr's
|
||||||
|
access-control allowlist confirms `AccessControl`-based peer authorization
|
||||||
|
is the right model — alknet-call already has it, ADR-028 should have used it.
|
||||||
|
|
||||||
|
### 4.3 Other relevant prior art
|
||||||
|
|
||||||
|
- **TypeScript `@alkdev/operations` `buildEnv()`** (referenced in ADR-015) —
|
||||||
|
the `allowedNamespaces` scoping is the flat-namespace-prefix model this
|
||||||
|
research replaces. The Rust `ScopedOperationEnv` already moved to
|
||||||
|
operation-level granularity; the peer model extends it to peer-qualified
|
||||||
|
granularity.
|
||||||
|
- **`/workspace/@alkdev/flowgraph`** (referenced in ADR-022) — the graph
|
||||||
|
model (operation graph, call graph, scoped subgraph). The peer-keyed
|
||||||
|
overlay is the peer dimension of the operation graph. Petgraph is the
|
||||||
|
future library for when path-finding across the peer graph becomes real;
|
||||||
|
v1's nested `HashMap` is the implicit-graph representation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. OQ Impact
|
||||||
|
|
||||||
|
| OQ | Status before | Status after | Notes |
|
||||||
|
|---|---|---|---|
|
||||||
|
| **OQ-25** (remote-safe marking shape) | open (two-way) | **Dissolved** | `remote_safe: bool` is removed entirely. The "shape" question is moot — there is no marking. Peer authorization is `AccessControl`-based, which already has a rich shape (scopes, resources, AND/OR gates). Per-peer differentiation is via `IdentityProvider` config (different peers get different scopes), not a per-op marking. |
|
||||||
|
| **OQ-26** (OperationAdapter error type) | open (two-way) | **Stays** | Unaffected. `from_call` still returns `Result<_, AdapterError>`; the peer-keying changes the registration target, not the error type. A `SamePeerCollision` variant may be added (replacing the flat `Conflict` variant). |
|
||||||
|
| **OQ-27** (from_call re-import trigger) | open (two-way) | **Stays** | Unaffected. Auto-on-reconnect is still the default; the overlay is now peer-scoped (drops with the connection), so re-import is naturally scoped to the new peer. |
|
||||||
|
| **OQ-28** (from_call namespace collision) | open (two-way) | **Dissolved (cross-peer) / stays (same-peer)** | Cross-peer collision dissolves: same name on different peers is fine (separate sub-overlays). Same-peer collision stays an error (`SamePeerCollision`). The `namespace_prefix` becomes optional local-naming sugar, not the disambiguation mechanism. |
|
||||||
|
| **OQ-29** (CallClient TLS client-auth) | open (two-way) | **Stays** | Unaffected. TLS client-auth is orthogonal to the routing model. |
|
||||||
|
|
||||||
|
**New OQs surfaced by this research:**
|
||||||
|
|
||||||
|
- **OQ-30 (proposed): `PeerRef::Any` routing policy.** v1 uses insertion-order
|
||||||
|
first-match. A richer policy (round-robin, least-loaded, affinity) is the
|
||||||
|
two-way-door remainder. Tracked as a new OQ; the `PeerRef` enum is designed
|
||||||
|
to compose with a future `RoutingPolicy` without breaking the signature.
|
||||||
|
- **OQ-31 (proposed): `services/list-peers` re-export semantics.** Whether
|
||||||
|
re-exported peer ops are listed by default, opt-in, or per-peer-policy is a
|
||||||
|
two-way-door. v1 defaults to "own ops only" (unchanged from today);
|
||||||
|
`services/list-peers` is the opt-in. The re-export policy (which peers' ops
|
||||||
|
a given peer sees) is an `AccessControl` decision on the listing op.
|
||||||
|
- **OQ-32 (proposed): Multi-hop federation.** Whether worker A transitively
|
||||||
|
sees worker B's ops through the head is a one-way door on the federation
|
||||||
|
model. v1 is one-hop (no transitive visibility). The peer-keyed overlay
|
||||||
|
model extends to multi-hop without redesign but requires a path-finding
|
||||||
|
layer (petgraph candidate). Tracked as a future OQ, not a v1 decision.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Open Questions the Research Surfaces but Doesn't Resolve
|
||||||
|
|
||||||
|
1. **`PeerId` stability across reconnects.** If a peer's `Identity.id` is its
|
||||||
|
TLS fingerprint, reconnects with a rotated key change the `PeerId`. The
|
||||||
|
peer-keyed overlay drops the old `PeerId`'s sub-overlay on disconnect and
|
||||||
|
creates a new one on reconnect — structurally clean, but a handler
|
||||||
|
mid-composition that captured a `PeerRef::Specific(old_peer_id)` gets
|
||||||
|
`NOT_FOUND` after reconnect. Is this acceptable, or does `PeerId` need to
|
||||||
|
be a stable logical identifier (e.g., a configured node name) separate from
|
||||||
|
the cryptographic identity? v1: `PeerId = Identity.id` (the fingerprint);
|
||||||
|
stable-logical-id is a future question.
|
||||||
|
|
||||||
|
2. **`PeerRef::Any` determinism.** Insertion-order first-match is deterministic
|
||||||
|
but order-dependent. If worker A connects before worker B, `Any` always
|
||||||
|
routes to A until A disconnects. Is this the right default, or should
|
||||||
|
`Any` be round-robin from the start? v1: insertion-order (simplest,
|
||||||
|
deterministic); round-robin is OQ-30.
|
||||||
|
|
||||||
|
3. **Reachability check ordering.** The current `invoke_with_policy` checks
|
||||||
|
`parent.scoped_env.allows(&name)` *before* routing
|
||||||
|
(`registry/env.rs:140-142`). Under the peer model, the reachability check
|
||||||
|
is peer-qualified (`ScopedPeerEnv::allows(peer, op)`). Should the
|
||||||
|
reachability check happen before or after peer resolution? v1: before
|
||||||
|
(same as today) — the scoped env is checked against the *resolved* name,
|
||||||
|
and peer-qualified reachability is part of the check. The POC validates
|
||||||
|
this composes.
|
||||||
|
|
||||||
|
4. **Capability exposure under `PeerRef::Any`.** When a handler composes via
|
||||||
|
`PeerRef::Any` and the routing picks worker A, the handler's
|
||||||
|
`Capabilities` propagate to worker A's call (same as today's
|
||||||
|
`from_call` forwarding). Is this correct when the handler didn't know
|
||||||
|
which peer would be selected? v1: yes — the handler declared the op in
|
||||||
|
its scoped env, so it authorized the composition; the peer selection is a
|
||||||
|
routing detail. If a handler needs per-peer capability scoping, it uses
|
||||||
|
`PeerRef::Specific` and peer-pinned reachability.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. POC Validation Results
|
||||||
|
|
||||||
|
A scratch POC module (`crates/alknet-call/src/scratch_peer_routing.rs`) was
|
||||||
|
written in-repo, type-checked against the real types via a temporary
|
||||||
|
`scratch-peer-routing` Cargo feature, validated, and **removed**. The repo
|
||||||
|
is clean: `cargo check -p alknet-call` passes, all 207 lib tests pass.
|
||||||
|
|
||||||
|
### What the POC validated (compiles and works):
|
||||||
|
|
||||||
|
1. **`PeerRef` enum + `PeerRoutingEnv` trait** — the peer-routing signature
|
||||||
|
compiles against the real `OperationContext`, `ResponseEnvelope`,
|
||||||
|
`AbortPolicy`, and `Arc<dyn OperationEnv>`. The `invoke_peer` method is
|
||||||
|
implementable and `Send + Sync` (required for the tokio::spawn dispatch
|
||||||
|
loop).
|
||||||
|
|
||||||
|
2. **`PeerCompositeEnv` with `HashMap<PeerId, Arc<dyn OperationEnv>>`** —
|
||||||
|
the peer-keyed composite env compiles. `attach_peer` / `detach_peer` /
|
||||||
|
`invoke_peer` (with `PeerRef::Specific` and `PeerRef::Any`) all type-check.
|
||||||
|
The `contains()` (union across peers) and `peer_contains()` (specific
|
||||||
|
peer) probes work. `Send + Sync` verified.
|
||||||
|
|
||||||
|
3. **`PeerOverlay` (`HashMap<PeerId, HashMap<String, HandlerRegistration>>`)** —
|
||||||
|
the peer-keyed overlay compiles. Same name on two peers (no collision),
|
||||||
|
`first_peer_for` (Any routing), `drop_peer` (structural disconnect
|
||||||
|
cleanup) all type-check and behave correctly.
|
||||||
|
|
||||||
|
4. **`AccessControl::check(peer_identity)` is sufficient** — the
|
||||||
|
`authorize_peer_call` function compiles and the assertions hold:
|
||||||
|
- Peer with the right scope → `Allowed`.
|
||||||
|
- Peer without the scope → `Forbidden`.
|
||||||
|
- No identity (unauthenticated) → `Forbidden` (auth required).
|
||||||
|
- Op with `AccessControl::default()` → `Allowed` for any peer (implicitly
|
||||||
|
remote-safe).
|
||||||
|
- `Visibility::Internal` op → `Forbidden` for wire calls (NOT_FOUND in
|
||||||
|
dispatch, never callable from wire regardless of peer).
|
||||||
|
|
||||||
|
5. **`ScopedPeerEnv` (peer-qualified reachability)** — compiles and composes
|
||||||
|
with the existing `ScopedOperationEnv` shape. Unqualified `allowed_ops`
|
||||||
|
(peer-agnostic) + peer-pinned `peer_pinned` set. `allows(peer, op)` checks
|
||||||
|
both. The assertions hold: peer-pinned to worker-a allows Specific(worker-a)
|
||||||
|
but not Specific(worker-b); unqualified allows Any.
|
||||||
|
|
||||||
|
6. **`list_services_peer_attributed`** — peer-attributed services/list
|
||||||
|
compiles. Filters by `AccessControl::check(calling_peer_identity)` —
|
||||||
|
only lists ops the calling peer is authorized to call. Own ops section
|
||||||
|
(`peer: None`) + per-peer re-exported sections (`peer: Some(id)`).
|
||||||
|
|
||||||
|
7. **`from_call_peer_keyed` + `FromCallConfigPeer` + `FromCallError`** —
|
||||||
|
the peer-aware from_call shape compiles. `namespace_prefix` is optional
|
||||||
|
sugar (local naming), `SamePeerCollision` replaces the flat `Conflict`.
|
||||||
|
|
||||||
|
### What didn't work / required adjustment:
|
||||||
|
|
||||||
|
- **`HandlerRegistration` is not `Clone`** — the POC initially tried
|
||||||
|
`reg.clone()` to register the same op into two peers' sub-overlays. Fixed
|
||||||
|
by constructing fresh registrations per peer (a helper `make_exec_reg()`).
|
||||||
|
This is a POC artifact, not a design issue — the real `from_call` produces
|
||||||
|
fresh registrations per peer anyway (each peer's discovery produces its own
|
||||||
|
bundles).
|
||||||
|
- **`#[cfg(any())]` does not type-check.** The common Rust POC pattern
|
||||||
|
`#[cfg(any())] pub mod scratch;` compiles but does *not* type-check the
|
||||||
|
module (the predicate is never true, so the module is excluded from
|
||||||
|
compilation entirely). To validate types, the POC must be actually
|
||||||
|
compiled. Used a temporary Cargo feature (`scratch-peer-routing`) to
|
||||||
|
enable type-checking, then removed the feature. This is the correct
|
||||||
|
pattern for POC validation that needs type-checking.
|
||||||
|
- **`#[cfg(all)]` is not the built-in `all` predicate** — it's treated as a
|
||||||
|
custom cfg that's false by default (with a warning). Don't use it; use a
|
||||||
|
feature gate.
|
||||||
|
|
||||||
|
### POC artifacts (not in repo):
|
||||||
|
|
||||||
|
The POC code is preserved in this research document's appendix (§10) for
|
||||||
|
reference. The scratch module was removed from the repo; only the research
|
||||||
|
doc and ADR draft survive.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Recommended `OperationEnv::invoke()` Signature
|
||||||
|
|
||||||
|
```rust
|
||||||
|
/// How a composing handler addresses a peer when invoking an operation.
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub enum PeerRef {
|
||||||
|
/// Route to this exact peer's overlay. NOT_FOUND if it doesn't serve the op
|
||||||
|
/// (no silent fallthrough to other peers — explicit routing must be
|
||||||
|
/// honored or fail loudly).
|
||||||
|
Specific(PeerId),
|
||||||
|
/// Route to the first peer (insertion order) whose overlay contains the op.
|
||||||
|
/// This is the "any worker that serves this name" fan-out primitive.
|
||||||
|
/// v1 uses insertion order; a richer RoutingPolicy is OQ-30.
|
||||||
|
Any,
|
||||||
|
}
|
||||||
|
|
||||||
|
pub type PeerId = String; // = Identity.id (the peer's fingerprint / declared label)
|
||||||
|
|
||||||
|
#[async_trait::async_trait]
|
||||||
|
pub trait OperationEnv: Send + Sync {
|
||||||
|
// Existing methods — unchanged (back-compat).
|
||||||
|
async fn invoke(&self, namespace: &str, operation: &str, input: Value,
|
||||||
|
parent: &OperationContext) -> ResponseEnvelope { /* default delegates */ }
|
||||||
|
async fn invoke_with_policy(&self, namespace: &str, operation: &str,
|
||||||
|
input: Value, parent: &OperationContext, policy: AbortPolicy) -> ResponseEnvelope;
|
||||||
|
fn contains(&self, _name: &str) -> bool { true }
|
||||||
|
|
||||||
|
// NEW: peer-routing method. Default-impl delegates to invoke_with_policy
|
||||||
|
// (back-compat: existing impls that don't override it route to "any" /
|
||||||
|
// the single connection, preserving current behavior). PeerCompositeEnv
|
||||||
|
// overrides with real peer routing.
|
||||||
|
async fn invoke_peer(&self, peer: &PeerRef, namespace: &str, operation: &str,
|
||||||
|
input: Value, parent: &OperationContext, policy: AbortPolicy) -> ResponseEnvelope {
|
||||||
|
self.invoke_with_policy(namespace, operation, input, parent, policy).await
|
||||||
|
}
|
||||||
|
|
||||||
|
// NEW: peer-qualified contains. Default: delegate to contains (back-compat).
|
||||||
|
fn peer_contains(&self, _peer: &PeerId, name: &str) -> bool { self.contains(name) }
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Recommended Peer-Keyed Overlay Shape
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Per-connection overlay — UNCHANGED (one connection = one peer, flat map is fine).
|
||||||
|
// crates/alknet-call/src/protocol/connection.rs
|
||||||
|
pub struct CallConnection {
|
||||||
|
connection: Arc<Connection>,
|
||||||
|
imported_operations: Arc<RwLock<HashMap<String, HandlerRegistration>>>, // flat, per-connection
|
||||||
|
pending: Arc<Mutex<PendingRequestMap>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Composite env — BECOMES peer-keyed (replaces CompositeOperationEnv's
|
||||||
|
// singular `connection: Option<Arc<dyn OperationEnv>>`).
|
||||||
|
pub struct PeerCompositeEnv {
|
||||||
|
pub base: Arc<dyn OperationEnv + Send + Sync>, // Layer 0 curated
|
||||||
|
pub session: Option<Arc<dyn OperationEnv + Send + Sync>>, // Layer 1
|
||||||
|
pub connections: HashMap<PeerId, Arc<dyn OperationEnv + Send + Sync>>, // Layer 2, peer-keyed
|
||||||
|
connection_order: Vec<PeerId>, // insertion order for PeerRef::Any first-match
|
||||||
|
}
|
||||||
|
|
||||||
|
// Peer-keyed overlay (used by the head node aggregating multiple connections).
|
||||||
|
#[derive(Default)]
|
||||||
|
pub struct PeerOverlay {
|
||||||
|
by_peer: HashMap<PeerId, HashMap<String, HandlerRegistration>>,
|
||||||
|
peer_order: Vec<PeerId>, // insertion order for PeerRef::Any
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Migration path:** `CompositeOperationEnv` (singular connection) becomes
|
||||||
|
`PeerCompositeEnv` (peer-keyed connections). The singular-connection case (one
|
||||||
|
peer) is the degenerate case: `connections: HashMap` with one entry. Existing
|
||||||
|
call sites that construct `CompositeOperationEnv::new(base, Some(conn), session)`
|
||||||
|
migrate to `PeerCompositeEnv::new(base).with_session(session).attach_peer(peer_id, conn)`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. Appendix: POC Code (Reference)
|
||||||
|
|
||||||
|
The POC module validated the design. It is preserved here for reference; it
|
||||||
|
is **not** in the repo (removed after validation). The key structures:
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>POC module (scratch_peer_routing.rs) — click to expand</summary>
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// (The full POC module — ~800 lines — validated against real types.
|
||||||
|
// Key structures: PeerRef, PeerRoutingEnv trait, PeerCompositeEnv, PeerOverlay,
|
||||||
|
// ScopedPeerEnv, authorize_peer_call, list_services_peer_attributed,
|
||||||
|
// from_call_peer_keyed, FromCallConfigPeer, FromCallError.
|
||||||
|
// See the research author's working tree for the full file; the structures
|
||||||
|
// are summarized in §3 and §8-9 above.)
|
||||||
|
```
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
The POC validated:
|
||||||
|
- `PeerRef` + `PeerRoutingEnv` compile against real types.
|
||||||
|
- `PeerCompositeEnv` routes `invoke_peer` to the right peer.
|
||||||
|
- `AccessControl::check(peer_identity)` authorizes without `remote_safe`.
|
||||||
|
- `ScopedPeerEnv` peer-qualified reachability composes with existing `ScopedOperationEnv`.
|
||||||
|
- `PeerOverlay` same-name-on-different-peers (no collision) + `drop_peer` cleanup.
|
||||||
|
- `list_services_peer_attributed` filters by `AccessControl::check(calling_peer)`.
|
||||||
|
- All shapes are `Send + Sync`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. ADR Draft (Supersedes ADR-028)
|
||||||
|
|
||||||
|
> **Note**: The full ADR should be written as a separate document
|
||||||
|
> (`docs/architecture/decisions/029-peer-graph-routing-model.md`) after
|
||||||
|
> review of this research. The draft below captures the decision shape; the
|
||||||
|
> ADR author should expand the Context with the problem statement from §1,
|
||||||
|
> the Consequences from §3, and the Assumptions from §6.
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# ADR-029: Peer-Graph Routing Model for alknet-call Composition
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Proposed (supersedes ADR-028)
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
[Summarize §1: flat-namespace single-peer model breaks for head→N-workers;
|
||||||
|
ADR-028's remote_safe/trusted_peer is a parallel, weaker authorization system
|
||||||
|
that doesn't compose with the existing AccessControl/Identity machinery.
|
||||||
|
The head→many-workers pattern (ray.io's model) is the primary use case and
|
||||||
|
cannot be expressed today. This is a blocking structural fix.]
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
### 1. Peer-keyed overlays
|
||||||
|
|
||||||
|
The Layer 2 overlay becomes peer-keyed. `CompositeOperationEnv`'s singular
|
||||||
|
`connection: Option<Arc<dyn OperationEnv>>` is replaced by
|
||||||
|
`PeerCompositeEnv` with `connections: HashMap<PeerId, Arc<dyn OperationEnv>>`.
|
||||||
|
[§3.1, §9]
|
||||||
|
|
||||||
|
### 2. `PeerRef` routing selector
|
||||||
|
|
||||||
|
`OperationEnv` gains a peer-routing method with a `PeerRef` selector
|
||||||
|
(`Specific(PeerId)` / `Any`). Default-impl preserves back-compat.
|
||||||
|
[§3.2, §8]
|
||||||
|
|
||||||
|
### 3. `AccessControl`-based peer authorization; retire `remote_safe`/`trusted_peer`
|
||||||
|
|
||||||
|
`RemoteFilter`, `HandlerRegistration::remote_safe`, `CallClient::trusted_peer`,
|
||||||
|
`list_operations_peer_scoped`, and `services_list_handler_peer_scoped` are
|
||||||
|
removed. Peer authorization flows through the existing `AccessControl::check`
|
||||||
|
against the peer's resolved `Identity`. The op's `AccessControl` *is* the
|
||||||
|
peer-authorization policy. [§3.3]
|
||||||
|
|
||||||
|
### 4. Peer-qualified reachability (`ScopedPeerEnv`)
|
||||||
|
|
||||||
|
`ScopedOperationEnv` is extended with an optional peer-pinned allowlist.
|
||||||
|
Unqualified reachability (peer-agnostic composition) stays the common case;
|
||||||
|
peer-pinning is opt-in and replaces `FromCallConfig::namespace_prefix` as the
|
||||||
|
disambiguation mechanism. [§3.4]
|
||||||
|
|
||||||
|
### 5. `from_call` peer-keyed registration; collision rule change
|
||||||
|
|
||||||
|
`from_call` registers into the specific peer's sub-overlay. Cross-peer
|
||||||
|
collision dissolves (same name on different peers is fine). Same-peer
|
||||||
|
collision stays an error. `namespace_prefix` becomes optional local-naming
|
||||||
|
sugar. [§3.6]
|
||||||
|
|
||||||
|
### 6. `services/list` AccessControl-filtered; `services/list-peers` opt-in
|
||||||
|
|
||||||
|
`services/list` filters by `AccessControl::check(calling_peer_identity)` (not
|
||||||
|
`remote_safe`). `services/list-peers` is the opt-in for peer-attributed
|
||||||
|
re-export listing. [§3.5]
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
[Summarize §3 + §5: OQ-25 and OQ-28 (cross-peer) dissolve; OQ-26/27/29 stay;
|
||||||
|
new OQ-30/31/32 surfaced. Positive: head→N-workers works, one authorization
|
||||||
|
system not two, structural disconnect cleanup. Negative: `OperationEnv` trait
|
||||||
|
gains a method (back-compat default-impl), `CompositeOperationEnv` →
|
||||||
|
`PeerCompositeEnv` migration, `services/list` semantics change.]
|
||||||
|
|
||||||
|
## Assumptions
|
||||||
|
|
||||||
|
[Summarize §6: PeerId stability, Any determinism, reachability ordering,
|
||||||
|
capability exposure under Any.]
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- ADR-015 (privilege model — the authority-switch pattern ADR-028 violated)
|
||||||
|
- ADR-017 (client/adapter contract — amended: CallClient no longer has
|
||||||
|
trusted_peer)
|
||||||
|
- ADR-022 (registration bundle — remote_safe field removed)
|
||||||
|
- ADR-024 (registry layering — Layer 2 becomes peer-keyed)
|
||||||
|
- ADR-028 (superseded)
|
||||||
|
- OQ-25 (dissolved), OQ-26/27/29 (stay), OQ-28 (cross-peer dissolved),
|
||||||
|
OQ-30/31/32 (new)
|
||||||
|
- Research: this document
|
||||||
|
- Prior art: Ray.io actors, Dapr service invocation
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. Confirmation: POC Removed, Build Clean
|
||||||
|
|
||||||
|
- Scratch module `crates/alknet-call/src/scratch_peer_routing.rs`: **removed**.
|
||||||
|
- `crates/alknet-call/src/lib.rs`: **restored** to original (no scratch module
|
||||||
|
reference).
|
||||||
|
- `crates/alknet-call/Cargo.toml`: **restored** (no `scratch-peer-routing`
|
||||||
|
feature).
|
||||||
|
- `cargo check -p alknet-call`: **passes** (clean).
|
||||||
|
- `cargo test -p alknet-call --lib`: **207 passed; 0 failed**.
|
||||||
|
|
||||||
|
Only the research doc (`docs/research/alknet-call-peer-routing/findings.md`)
|
||||||
|
and the ADR draft (§11, to be split out as ADR-029) survive.
|
||||||
Reference in New Issue
Block a user