docs(http): resolve OQ-39; add ADRs 045-047; record pubsub prior art for WS path

OQ-39 (to_openapi published-spec versioning) resolved by ADR-045:
info.version semver tracks the gateway endpoint contract, not the
operation set — per-caller operations discovered via /search do not
bump the version. The gateway pattern (ADR-042) dissolved most of the
original churn concern.

ADR-046: assembly-layer custom HTTP routes on HttpAdapter. The HTTP
router had no documented extension point for deployment-specific
endpoints (e.g., an OAI-compatible proxy at /v1/chat/completions). Adds
extra_routes: Option<Router> at construction; raw HTTP, not operations;
default surface takes precedence on collision. The mechanism is the
one-way door; specific routes are two-way.

ADR-047: remove the direct-call POST /{service}/{op} HTTP surface. The
gateway /call is the sole invoke path — the simplified contract is a
few fixed endpoints, not a per-operation REST tree. The direct-call
surface re-introduced the 'dump the full API regardless of privs'
failure mode at the HTTP level that the gateway /search was built to
escape. ADR-036's routing decision is superseded; its non-routing
clauses (SSE, Bearer auth, /healthz, stealth, error mapping) survive.
A deployment wanting a REST-like per-operation surface builds it as a
custom route projection (ADR-046).

ADR-044 updated with the tradeoff framing (WSS is the right tool for
the call-protocol-from-browser case; WebTransport is the right tool for
the generalized ALPN-stream-proxy case we don't have yet — coexist, not
migrate) and the @alkdev/pubsub concrete prior art (the EventEnvelope
{type,id,payload} the call protocol was derived from already has a
working WebSocket client/server; the sync is a small adjustment, not a
from-scratch build).

call-protocol.md references the pubsub lineage for the
transport-agnosticism claim.
This commit is contained in:
2026-06-30 09:49:25 +00:00
parent 3327d585da
commit 2a6e4c371a
14 changed files with 1082 additions and 129 deletions

View File

@@ -2,7 +2,13 @@
## Status
Proposed
Proposed — **routing decision superseded by
[ADR-047](047-remove-direct-call-http-surface.md)** (the direct-call
surface `POST /{service}/{op}` is removed; the gateway `/call` is the
sole invoke path). ADR-036's other clauses — SSE projection, Bearer
auth, `/healthz`, stealth decoy, error mapping, `External`-only
dispatch — remain in force (see ADR-047 §"What survives from
ADR-036"). The `to_openapi` clause was already superseded by ADR-042.
## Context
@@ -42,6 +48,15 @@ routing table that has to be kept in sync with the registry.
## Decision
> **Routing decision superseded by
> [ADR-047](047-remove-direct-call-http-surface.md).** The direct-call
> surface defined below (`POST /{service}/{op}` → `call.requested`) is
> removed — the gateway's `/call` endpoint (ADR-042) is the sole invoke
> path over HTTP. This section is retained as the historical record of
> the original decision; ADR-047 records the reversal and what survives.
> The `to_openapi` clause below was already superseded by ADR-042 (see
> the amendment in this section).
**Direct path mapping is the default HTTP surface; `to_openapi` is the
discovery/projection layer, not a parallel router.**
@@ -68,10 +83,16 @@ serves; there is no second mapping.
> table" property is preserved (the gateway endpoints are fixed; the
> per-caller operation surface is discovered via `/search`, not preloaded
> into a generated path set). The direct-call surface (`POST
> /{service}/{op}`) that this ADR defines is **unchanged** — ADR-042 only
> changes what `to_openapi` *describes*, not what the HTTP handler
> *serves*. A traditional per-operation-paths OpenAPI projection remains
> available as an additive alternative (ADR-042 §5).
> /{service}/{op}`) that this ADR defines was **unchanged at the time**
> — ADR-042 only changed what `to_openapi` *describes*, not what the
> HTTP handler *serves*. **The direct-call surface was later removed by
> [ADR-047](047-remove-direct-call-http-surface.md)** (the gateway
> `/call` is the sole invoke path; the simplified contract is a few
> fixed endpoints, not a per-operation REST tree). A traditional
> per-operation-paths OpenAPI projection remains available as an
> additive alternative (ADR-042 §5), and a deployment that wants the
> former direct-call HTTP surface builds it as a custom route
> projection (ADR-046).
### HTTP method semantics
@@ -212,7 +233,11 @@ without auth before identity is resolvable.
- [ADR-042](042-openapi-gateway-pattern.md) — supersedes this ADR's
`to_openapi` clause (the per-operation-paths projection is replaced by
the 5-endpoint gateway pattern; the direct-call surface this ADR
defines is unchanged)
defines is unchanged*at the time*; ADR-047 later removes it)
- [ADR-047](047-remove-direct-call-http-surface.md) — supersedes this
ADR's routing decision (the direct-call surface is removed; the
gateway `/call` is the sole invoke path). This ADR's non-routing
clauses survive.
- OQ-13 (resolved) — operation path format `/{service}/{op}`
- `docs/research/alknet-http/phase-0-findings.md` DH-3 — the decision this
ADR resolves

View File

@@ -130,10 +130,11 @@ Once published, the gateway endpoint set (5 endpoints) and the
request/response shapes are a compatibility contract (ADR-017
Consequences). Adding endpoints is additive (non-breaking); removing or
renaming is a one-way door. The initial 5-endpoint set is the published
contract. The versioning strategy for the generated doc is tracked as
OQ-39 (same as the per-operation-paths versioning question — the
gateway pattern doesn't eliminate the versioning concern, it simplifies
it to 5 stable endpoints instead of a per-operation surface).
contract. The versioning strategy for the generated doc was tracked as
OQ-39 (now **resolved by [ADR-045](045-to-openapi-gateway-spec-versioning.md)**:
`info.version` semver tracks the gateway endpoint contract, not the
operation set) — the gateway pattern simplifies versioning to 5 stable
endpoints instead of a per-operation surface.
### 5. A traditional per-operation-paths projection is additive, not replacement
@@ -244,5 +245,6 @@ require it for the common case.
pattern for `to_mcp` (4 tools; `subscribe` excluded because MCP tool
calls are request/response)
- OQ-39 — `to_openapi` published-spec versioning (simplified by the
gateway pattern to 5 stable endpoints)
gateway pattern to 5 stable endpoints; **resolved by
[ADR-045](045-to-openapi-gateway-spec-versioning.md)**)
- `crates/http/http-adapters.md` — the spec that implements the gateway

View File

@@ -98,6 +98,61 @@ protocol bidirectionally," their answer was WSS, not WebTransport. Aligning
with that precedent is not cutting against competent practice — it is
matching it.
### Concrete prior art: `@alkdev/pubsub`
The WebSocket path is not speculative — there is working prior art in the
same workspace. The `@alkdev/pubsub` package (`/workspace/@alkdev/pubsub/`)
already has a WebSocket client (`event-target-websocket-client.ts`) and
server (`event-target-websocket-server.ts`) built on a generalized "event
target" abstraction with an `EventEnvelope { type, id, payload }` shape.
The alknet call protocol's `EventEnvelope` was derived from this envelope
(refined with typed event names `call.requested`/`call.responded`/etc. and
structured payloads); the sibling `@alkdev/operations` package
(`/workspace/@alkdev/operations/`) shares the lineage and uses the
`path.do.op` (dot-separated) vs alknet's `path/to/op` (slash-separated)
convention — a minor, mechanical delta. Syncing the pubsub/operations
WebSocket client to the alknet call protocol's envelope is a small adjustment
(~a day of work: the envelope shape, the event-name typing, the path
separator), not a from-scratch browser-client build. This is why the
WebSocket path opens doors quickly: the browser (and Node) client is
mostly already written.
### The tradeoff between two use cases, not "good enough for now"
It is worth being precise about *why* WSS is the right choice here, because
"good enough until it isn't" undersells the decision. The two browser-reach
use cases have different right tools:
- **The call protocol from a browser (bidirectional).** WSS is *genuinely
the right tool*, not a stopgap. The call protocol multiplexes by request
ID (ADR-012), not by stream — it does not need WebTransport's per-stream
multiplexing. A WebSocket is a full-duplex, long-lived, framed-message
channel; the call protocol's `EventEnvelope` framing fits a WS binary
message cleanly (one envelope = one message). For this use case,
WebTransport's stream model is engineering sophistication the call protocol
has no use for. WSS is not "good enough" — it is well-matched.
- **The generalized ALPN router/proxy (a browser reaching a non-call ALPN
— SSH/SFTP/git via WASM).** WebTransport's native multi-stream model is
*genuinely the right tool* here, and WSS is *probably worse* for it. A
browser reaching a non-call ALPN over WSS would have to multiplex logical
streams over one WS frame stream by application-level framing — doable
(ADR-043 §"SSH/SFTP/git-over-WSS-from-a-browser is technically possible"),
but it re-implements at the application layer what WebTransport gives at
the transport layer. This is the use case WebTransport was built for, and
it is the speculative one (Finding 3) — the consumers (WASM SSH/SFTP/git
parsers) do not exist yet.
So the deferral is not "use the worse tool now, upgrade to the better tool
later." It is "use the right tool for the use case we *have* (call protocol
from a browser → WSS), and defer building the tool for the use case we
*don't have yet* (generalized ALPN proxy → WebTransport)." When WebTransport
arrives, the two coexist (§Reversal point 3): WSS stays as the simpler
call-protocol path; WebTransport adds the ALPN-stream-proxy path. Neither
replaces the other. This is "good enough is good enough until it isn't" in
the precise sense: WSS is good enough for the call-protocol case *because
it is the right tool*, and the case where WebTransport would be better is
a case we don't have yet.
## Decision
### 1. Defer `h3`/WebTransport. Browsers reach the call protocol over WebSocket.
@@ -241,9 +296,12 @@ by reference to this section.
is a real deployment needing it.
- WebSocket is a single stream; it lacks WebTransport's native multi-stream
multiplexing. For the call protocol this is fine (correlation is by request
ID, not by stream — ADR-012), but it means a future migration to
WebTransport would be a genuine upgrade, not a no-op. The migration path
is the spec that already exists (`webtransport.md`).
ID, not by stream — ADR-012), and WSS is the well-matched tool for that use
case (see §"The tradeoff between two use cases"). Where WebTransport's
stream model would matter is the ALPN-stream-proxy (ADR-040) — the
speculative use case whose deferral this ADR commits. The migration path
is the spec that already exists (`webtransport.md`), and when WebTransport
arrives it coexists with WSS rather than replacing it.
- ADR-043's "WebTransport restores bidirectionality" framing (§5) becomes
"WebSocket restores bidirectionality" for v1. The framing transfer is clean
(§3 above), but the prose in `http-server.md` and the ADRs must reflect it.
@@ -290,11 +348,15 @@ from scratch. See `webtransport.md` §"Research note" for the cross-reference.
1. **The call protocol's `EventEnvelope` framing fits a WebSocket binary
message boundary cleanly.** An `EventEnvelope` is a self-delimited JSON
object; one envelope per WS binary message. No streaming deserializer
across message boundaries is needed. This is verified by implementation
when the WS browser path is built, not by a separate research spike — the
call protocol spec (`call-protocol.md`) and the EventEnvelope shape
already make this property clear, and WebSocket binary messages are a
standard byte-framed transport.
across message boundaries is needed. This is already verified by prior
art: the `@alkdev/pubsub` WebSocket client/server
(`/workspace/@alkdev/pubsub/src/event-target-websocket-client.ts`,
`event-target-websocket-server.ts`) carries the same
`{ type, id, payload }` envelope over WS binary messages — the alknet
`EventEnvelope` is a refined superset of that shape (typed event names,
structured payloads). The call protocol spec (`call-protocol.md`) and
the EventEnvelope shape make the property clear, and the pubsub prior
art demonstrates it concretely.
2. **WebSocket upgrade over HTTP/1.1 or HTTP/2 is supported by the axum/
hyper stack natively.** `axum::extract::ws` provides the upgrade handler;

View File

@@ -0,0 +1,177 @@
# ADR-045: to_openapi Gateway-Spec Versioning
## Status
Proposed
## Context
OQ-39 asked how the published `to_openapi` spec is versioned. ADR-017
Consequences established that a published `to_*` spec is a compatibility
contract: once external clients build against it, the mapping semantics
become a de facto contract and changing them breaks every client.
The original framing of OQ-39 assumed `to_openapi` generated a
traditional per-operation-paths OpenAPI doc — one path per `External`
operation, changing whenever an operation is added, removed, or has its
schema modified. Under that model the versioning surface is large and
churns constantly, and the doc is a static full-surface dump (the Gitea
failure mode: admin ops shown to every caller, no per-caller filtering).
ADR-042 replaced that model with the **gateway pattern**: `to_openapi`
generates a doc describing **5 fixed gateway endpoints**
(`/search`, `/schema`, `/call`, `/batch`, `/subscribe`), and the
per-caller operation surface is discovered at runtime through
`AccessControl`-filtered `/search` — not preloaded into the static doc.
This is the same mechanic as the MCP gateway (ADR-041), with `subscribe`
added because OpenAPI/SSE supports streaming where MCP tool calls are
request/response.
The consequence for versioning: the published doc is now a small, stable
surface that changes only when the gateway endpoint set or an endpoint's
request/response shape changes. Per-caller operation changes
(adding/removing/modifying operations, changing an operation's schema)
do **not** change the published doc — those operations are not in the
doc; they are discovered via `/search`. This dissolves most of the
churn the original OQ-39 was concerned about.
What remains is the narrow versioning question: how does the published
gateway doc signal its version so consumers can detect breaking changes?
This is one-way after first publication — once external clients build
against the gateway doc, renaming `/call` or changing its request shape
breaks them.
A note on door-type framing: ADR-009 classifies doors by reversal cost
in the codebase. The "published artifact is a contract" case is a blind
spot in that framework — the published doc's reversal cost is paid by
external consumers, not in the codebase. ADR-017 Consequences captures
this (published `to_*` specs are compatibility contracts); this ADR
honors the constraint without changing ADR-009's framework. The door is
two-way before first publication (the gateway shape can be revised
freely while no external client depends on it) and one-way after
(revising requires a major version bump that signals breakage to
consumers).
## Decision
### 1. The published gateway doc carries a semver `info.version`
`to_openapi` emits `info.version` as a semver string. The version
reflects the **gateway endpoint contract** (the 5 endpoints + their
request/response shapes), not the operation set:
- **Major bump** — breaking change to the gateway contract: an endpoint
removed or renamed, a required field added to a gateway endpoint's
request, a response shape changed in a backward-incompatible way
(including removing or retyping an existing response field, or
tightening an optional field to required),
the error-mapping semantics (ADR-023) changed.
- **Minor bump** — additive change: a new gateway endpoint added
(e.g., a future `/subscribe-batch`), a new optional request field, a
new response field. Additive changes do not break existing clients.
- **Patch bump** — description/wording changes, documentation, no shape
change.
Cases not enumerated above follow **standard semver**: a change is a
major bump if it could break a client built against the prior version,
a minor bump if it is purely additive, a patch bump otherwise. The
enumerated triggers above are the common cases, not an exhaustive list.
Per-caller operation changes (registering a new operation, removing one,
changing an operation's input schema) **do not bump the version** — the
operation set is not part of the published doc; it is discovered via
`/search` at runtime. This is the key simplification the gateway pattern
buys: the operation surface can evolve freely without touching the
published contract version.
### 2. The version is bumped on change to the gateway shape, not on regeneration
A deployment that regenerates the doc (e.g., on restart) gets the same
`info.version` unless the gateway shape changed. The version is a
function of the gateway contract, not of when the doc was generated.
### 3. Consumers detect breaking changes via the major version
A client reading the doc compares `info.version`'s major component to
the version it built against. A major bump signals "re-read the doc,
something broke." The minor/patch components are informational. This is
the standard OpenAPI/semver convention — no alknet-specific detection
mechanism.
### 4. The traditional per-operation-paths projection (additive, ADR-042 §5) versions independently
A deployment that builds the additive traditional REST projection
(ADR-042 §5) versions that doc on its own schedule — its surface
*does* change with the operation set, so its versioning is the
per-operation churn OQ-39 originally worried about. That projection is
opt-in and out of scope for this ADR; the gateway doc is the default
published contract and the one this ADR governs.
## Consequences
**Positive:**
- The published contract is a 5-endpoint surface that rarely changes.
Versioning is bump-on-change, not bump-on-every-operation-change. The
original OQ-39 concern (constant churn) is dissolved by the gateway
pattern — the operation set is not in the doc.
- Consumers use standard semver/OpenAPI `info.version` — no
alknet-specific version-detection mechanism to learn.
- Per-caller operation evolution (the common case) is decoupled from the
published-contract version. A node can add/remove operations freely
without bumping the doc version or breaking clients built against the
gateway doc.
- The Gitea failure mode stays structurally impossible (ADR-042 §3):
`/search` is `AccessControl`-filtered, so the doc never exposes ops
the caller can't call. Versioning inherits this — the doc describes
the gateway, not the operations.
**Negative:**
- A client cannot tell from the doc version alone *which* operations are
available — it must call `/search`. This is by design (per-caller,
runtime), but a client expecting a static operation list from the doc
must learn the gateway pattern.
- The version only signals gateway-contract changes. An operation
changing its input schema (a breaking change for callers of that
operation) does not bump the doc version — that change is surfaced via
`/schema` per-operation, not via the doc version. Clients that cache
operation schemas must re-fetch `/schema` to detect per-operation
changes; the doc version does not track them.
## Assumptions
1. **The 5-endpoint gateway set is stable.** ADR-042 Assumption 1. Adding
endpoints is additive (minor bump); removing/renaming is a major bump.
The initial 5-endpoint set is the first published contract.
2. **Per-operation schema changes are detected via `/schema`, not the
doc version.** The doc version tracks the gateway contract only. A
client that caches an operation's `OperationSpec` re-fetches `/schema`
to detect changes to that operation. This is the standard
discovery-then-invoke pattern; the doc version is not a per-operation
change tracker.
3. **`info.version` is the single source of truth for the published
contract version.** No separate `x-alknet-version` extension or
content-hash header. Standard OpenAPI field, standard semver
interpretation. A content-hash would be more precise but adds an
alknet-specific mechanism for no real gain over semver-on-shape-
change.
## References
- [ADR-009](009-one-way-door-decision-framework.md) — door-type
framework (classifies by codebase reversal cost; the
published-artifact-as-contract case is the blind spot this ADR honors
without changing the framework)
- [ADR-017](017-call-protocol-client-and-adapter-contract.md) — published
`to_*` specs are compatibility contracts (the one-way-after-
publication constraint)
- [ADR-023](023-operation-error-schemas.md) — error-mapping semantics
are part of the gateway contract (a change to them is a major bump)
- [ADR-036](036-http-to-call-operation-mapping.md) — the SSE projection
for `/subscribe` (part of the gateway contract)
- [ADR-042](042-openapi-gateway-pattern.md) — the gateway pattern that
makes the published doc a 5-endpoint surface instead of a per-
operation surface; §4 explicitly deferred versioning to OQ-39
- OQ-39 — `to_openapi` published-spec versioning (resolved by this ADR)
- `crates/http/http-adapters.md` — the spec that emits `info.version`

View File

@@ -0,0 +1,248 @@
# ADR-046: Assembly-Layer Custom HTTP Routes on HttpAdapter
## Status
Proposed
## Context
The `HttpAdapter` (`crates/http/http-server.md`) is constructed by the
assembly layer with an `Arc<dyn IdentityProvider>`, an
`Arc<OperationRegistry>`, and a `DecoyConfig`. The axum `Router` it
builds has a fixed surface:
- The `to_openapi` gateway endpoints (`/search`, `/schema`, `/call`,
`/batch`, `/subscribe` — ADR-042) — the sole invoke path over HTTP
(ADR-047 removed the direct-call `POST /{service}/{op}` surface that
ADR-036 originally defined).
- `/healthz`, `/openapi.json`, the MCP route (feature-gated), and the
decoy fallback for unknown paths.
There is no documented extension point for a downstream deployment to
add its own HTTP routes to this router. A deployment that wants to
expose a custom HTTP endpoint — one that is *not* a gateway endpoint —
has no specified way to do so. The architecture currently ties the HTTP
surface to the simplified contract with no escape hatch.
### The concrete use case
A hub deployment (e.g., `api.alk.dev`) wants to expose the standard
alknet contract (direct-call + gateway) **and** an OpenAI-compatible
proxy at `/v1/chat/completions`. The OAI proxy is a custom HTTP route:
it receives an OpenAI-shaped request, dispatches into the
`OperationRegistry` (likely to a `from_openapi`-imported `openai/chat`
operation or a custom agent operation), and returns an OpenAI-shaped
response. It is not an alknet operation — it is a deployment-specific
HTTP endpoint that uses the registry as a backend.
This pattern is not exotic. It is the standard "wrap an external API
shape around our operations" pattern: a deployment adds a
compatibility shim (OAI-compatible, Anthropic-compatible, a legacy API
shape) as a custom route, backed by call-protocol operations. The
alternative — forcing every custom endpoint to be a call-protocol
operation whose input/output match the external API's shape — is
brittle (the OAI streaming response shape is not a clean call-protocol
output) and unnecessary (the deployment owns the HTTP shape; the
registry owns the operation shape).
The runner pattern that motivates this (remote GPU instance downloads
a binary, connects back to the hub via `from_call`, registers its ops;
opencode connects to the hub as a standard OAI provider) is already
supported by the existing architecture (`from_call`, `PeerRef` routing,
`from_openapi` to wrap the OAI API). The only missing piece is the
custom HTTP route on the hub.
### Why this needs an ADR
The extension mechanism — how the assembly layer injects custom routes
— is a published API surface of `HttpAdapter`. Once downstream
deployments build against it (passing their custom routers at
construction), changing the mechanism is a one-way door (every consumer
construction site breaks). It needs an ADR before implementation so the
contract is deliberate, not accidental.
The specific routes a deployment adds are a two-way door (add/remove
freely, no protocol contract). The *mechanism* (the constructor
parameter and its semantics) is the one-way door this ADR commits.
## Decision
### 1. HttpAdapter accepts additional axum routes from the assembly layer
The `HttpAdapter` constructor gains a parameter for deployment-specific
routes. The assembly layer builds an `axum::Router` with its custom
routes and passes it in; `HttpAdapter` composes it with the default
surface. A deployment that passes no custom routes gets exactly the
documented default behavior — the extension point is additive, not
mandatory.
```rust
pub struct HttpAdapter {
identity_provider: Arc<dyn IdentityProvider>,
registry: Arc<OperationRegistry>,
decoy: DecoyConfig,
/// Deployment-specific routes added by the assembly layer. None =
/// the default surface only. See ADR-046.
extra_routes: Option<Router>,
}
```
The exact composition mechanism (merge vs nest vs builder, whether
custom routes get a prefix) is a two-way-door implementation detail;
the one-way constraint is that the assembly layer can inject routes
and they coexist with the default surface. axum's `Router::merge` /
`Router::nest` are the natural primitives.
### 2. Custom routes are raw HTTP, not call-protocol operations
A custom route is a raw axum handler — it receives an HTTP request and
returns an HTTP response. It is not registered in the
`OperationRegistry`, not discoverable via `/search`, not described in
the `to_openapi` gateway doc. The deployment owns its shape entirely.
A custom route *may* dispatch into the `OperationRegistry` (via
`OperationRegistry::invoke()`, same as the gateway's `/call` endpoint
does) if
it wants to back the HTTP endpoint with a call-protocol operation. The
OAI-compatible proxy does this: the `/v1/chat/completions` handler
parses the OAI request, invokes the `openai/chat` (or `agent/chat`)
operation, and reformats the response as an OAI response. But this is
the custom route's choice — it could equally be a pure HTTP handler
that never touches the registry (a webhook receiver, a static asset
server, a legacy API shim with its own backend).
### 3. The default surface's reserved paths take precedence on collision
The default-surface paths are reserved: `/search`, `/schema`, `/call`,
`/batch`, `/subscribe`, `/healthz`, `/openapi.json`, and the MCP route.
(ADR-047 removed the direct-call `/{service}/{op}` surface, so it is no
longer a reserved path; a deployment that builds a per-operation
projection as a custom route is the one case where `/{service}/{op}`
patterns appear, and those custom routes are subject to the same
collision rule.) If a custom route collides with a reserved path, the
default surface wins — the custom route is silently shadowed (or the
construction panics/warns; the specific collision-handling is a
two-way-door implementation detail). A deployment that wants
`/v1/chat/completions` namespaces it away from the reserved set, which
is natural (`/v1/...` doesn't collide).
### 4. Custom routes carry the same auth middleware by default; per-route opt-out is the deployment's choice
Custom routes run under the same Bearer-auth resolution as the default
surface (the `Authorization: Bearer``resolve_from_token` path). A
deployment that wants a custom route to be unauthenticated (a public
webhook receiver, a health endpoint with a different shape than
`/healthz`) applies axum middleware to opt that route out of auth —
the deployment owns its custom routes' middleware stack. The
`HttpAdapter` provides the identity provider and the default auth
middleware; the custom `Router` the assembly layer passes in can
layer its own middleware on top. This is standard axum composition; no
alknet-specific mechanism.
### 5. Custom routes are not part of the published `to_openapi` doc
The `to_openapi` gateway doc (ADR-042, ADR-045) describes the 5
gateway endpoints — the default contract. Custom routes are
deployment-specific and not described by `to_openapi`. A deployment
that wants its custom routes documented for external consumers
generates its own OpenAPI doc for them (a separate projection, not
`to_openapi`). The default `info.version` semver (ADR-045) tracks the
gateway contract, not custom routes — custom routes have no
versioning contract with alknet; the deployment versions them however
it wants.
### 6. This does not change the default surface
A deployment that constructs `HttpAdapter` with no extra routes gets
exactly the behavior documented in `http-server.md` — direct-call,
gateway, `/healthz`, `/openapi.json`, MCP (feature-gated), decoy. The
extension point is purely additive. The default surface remains the
published contract (ADR-036, ADR-042, ADR-045); custom routes are a
deployment-specific addition on top, not a modification of it.
## Consequences
**Positive:**
- Deployments can wrap external API shapes (OAI-compatible,
Anthropic-compatible, legacy) around call-protocol operations without
forcing the external shape into the operation's input/output schema.
The "compatibility shim" pattern is first-class.
- The runner pattern (remote worker → `from_call` → hub → custom OAI
route → opencode) works end-to-end with no architectural gap. The hub
is a standard alknet node *plus* a deployment-specific HTTP surface.
- The extension point is standard axum composition — no alknet-specific
routing abstraction for deployers to learn. A developer who knows
axum can add routes.
- The default surface is unchanged for deployments that don't need
custom routes. No complexity tax for the common case.
**Negative:**
- The HTTP surface is no longer fully described by the alknet specs
alone — a deployment's custom routes are outside the architecture
docs. This is inherent to the extension point (the deployment owns
them); the specs describe the *default* surface and the *mechanism*,
not every possible custom route.
- A custom route that dispatches into the registry bypasses the
gateway's `AccessControl`-filtered `/search` discovery — the custom
route is responsible for its own authorization story. The default
Bearer-auth middleware covers the common case, but a custom route
that wants per-operation ACL checks must call
`OperationRegistry::invoke()` with a proper `OperationContext`
(caller identity from the resolved bearer token), not a bypass. The
`invoke()` path enforces `AccessControl` regardless of the entry
point (direct-call, gateway, or custom route), so this is not an
ACL bypass — but the custom route author must construct the context
correctly.
- Two deployments with custom routes have different HTTP surfaces —
there is no single "what does an alknet HTTP endpoint look like"
answer anymore. The default surface is the contract; custom routes
are deployment-specific variance. This is honest (deployments
*do* vary) but means the architecture docs describe the default, not
the union.
## Assumptions
1. **The assembly layer is the composition point.** Custom routes are
added at `HttpAdapter` construction, not registered dynamically at
runtime. This matches the static-registration constraint (OQ-04 /
ADR-010) for the `HandlerRegistry`; the `HttpAdapter`'s router is
likewise immutable after construction. Dynamic route addition would
require `ArcSwap<Router>` and is not part of this ADR.
2. **Custom routes are a deployment concern, not an alknet-crate
concern.** `alknet-http` provides the extension point (accepts the
extra `Router`); it does not provide custom route implementations.
The OAI-compatible proxy, the legacy API shim, the webhook receiver
are all written by the deployment (or a downstream crate like
`alknet-agent` that builds on `alknet-http`), not by `alknet-http`
itself.
3. **The default surface is the published contract; custom routes are
not.** ADR-036 (direct-call), ADR-042 (gateway), ADR-045 (versioning)
govern the default surface. Custom routes have no alknet-governed
compatibility contract — the deployment owns their stability. This
keeps the published-contract surface small and stable while allowing
arbitrary deployment-specific extension.
4. **axum's composition primitives are sufficient.** `Router::merge`,
`Router::nest`, and axum middleware cover the extension patterns
needed (custom routes, per-route auth opt-out, prefix namespacing).
No alknet-specific routing abstraction is required. If a future
need exceeds axum's composition (e.g., route-level dynamic dispatch),
that would be a separate ADR.
## References
- [ADR-010](010-alpn-router-and-endpoint.md) — static registration at
startup (the `HttpAdapter` router is immutable after construction,
same constraint)
- [ADR-042](042-openapi-gateway-pattern.md) — the gateway endpoints
(the default surface custom routes coexist with; reserved paths)
- [ADR-045](045-to-openapi-gateway-spec-versioning.md) — the published
doc versions the gateway contract, not custom routes
- [ADR-047](047-remove-direct-call-http-surface.md) — the direct-call
surface is removed; the gateway is the sole invoke path (a
deployment that wants the former per-operation HTTP surface builds it
as a custom route projection; this ADR §4 is the mechanism)
- `crates/http/http-server.md` — the `HttpAdapter` spec that gains the
`extra_routes` constructor parameter

View File

@@ -0,0 +1,281 @@
# ADR-047: Remove the Direct-Call HTTP Surface; Gateway Is the Sole Invoke Path
## Status
Proposed
## Supersedes
The "direct path mapping" clause of [ADR-036](036-http-to-call-operation-mapping.md)
§Decision ("Direct path mapping is the default HTTP surface") and §HTTP
method semantics. ADR-036's other clauses (SSE projection, Bearer auth,
`/healthz`, stealth decoy, error mapping, `External`-only dispatch)
remain in force — they are independent of the routing decision and are
reaffirmed by this ADR (see §"What survives from ADR-036").
## Context
ADR-036 defined the HTTP surface as **direct path mapping**:
`POST /{service}/{op}``call.requested` for every `External`
operation. An operation `fs/readFile` was served at `POST /fs/readFile`,
one HTTP path per operation — a REST-like surface mirroring the call
protocol's `/{service}/{op}` operation paths. This was the original HTTP
contract, decided before the simplified-contract / gateway-pattern
work landed.
Since then, three shifts made the direct-call surface a contradiction
with the architecture's settled model:
1. **ADR-042** replaced `to_openapi`'s per-operation-paths projection
with the **gateway pattern** — 5 fixed endpoints (`/search`,
`/schema`, `/call`, `/batch`, `/subscribe`) where the per-caller
operation surface is discovered via `AccessControl`-filtered
`/search`, not preloaded into a static doc. The gateway's `/call`
endpoint is the invoke path: `POST /call` with
`{ operation: "/fs/readFile", input: {...} }`. This is the same
RPC-shape pattern MCP uses (`tools/call` with a tool name, ADR-041).
2. **The simplified contract is the few-fixed-endpoints model**, not a
per-operation REST tree. The whole point of the gateway pattern
(ADR-042) was to escape the "static full-surface dump" failure mode
(the Gitea anti-pattern: every operation gets a path, every caller
sees the full surface, per-caller access is an afterthought). The
direct-call surface is that anti-pattern at the HTTP level: every
`External` operation gets an HTTP path, the path exists regardless
of the caller's privilege, and the caller discovers what it can call
by trial-and-error `403`s. The gateway's `/search` exists precisely
to make the per-caller surface the default; the direct-call surface
re-introduces the problem the gateway solved.
3. **ADR-046** added the custom-routes extension point, so a
deployment that genuinely wants a REST-like per-operation HTTP
surface (e.g., to match a legacy API shape) builds it as a custom
route projection (additive, deployment-owned, not the alknet
default contract). The direct-call surface is no longer the only
way to get per-operation HTTP paths; it's the *default* way, and
it's the wrong default.
The result: the HTTP router currently has **two ways to invoke an
operation** — the direct-call surface (`POST /fs/readFile`) and the
gateway (`POST /call` with the operation name in the body). That is the
contradiction: the simplified contract says "a few core endpoints,"
and the direct-call surface is a second, per-operation invoke path that
duplicates the gateway's `/call` with a scheme the gateway was built
to replace. ADR-042's amendment explicitly preserved the direct-call
surface ("unchanged"); that preservation was a leftover from before
the simplified contract was fully thought through, not a deliberate
endorsement of two invoke paths.
### The clean-up
The direct-call surface is residual from early-stage planning, the
same way the pre-ADR-042 `to_openapi` per-operation-paths projection
was residual. ADR-042 cleaned up `to_openapi`; this ADR cleans up the
HTTP handler's routing. The gateway becomes the sole invoke path; the
per-operation HTTP paths go away.
### What about HTTP clients that knew operation names?
A client that previously called `POST /fs/readFile` now calls
`POST /call` with `{ "operation": "/fs/readFile", "input": {...} }`. The
operation name is still the call protocol's `/{service}/{op}` form
(OQ-13, unchanged) — it moves from the HTTP path to the request body.
The gateway's `/call` is the standard invoke endpoint; the direct path
was a REST-like affordance that the simplified contract deliberately
drops. This is a breaking change for any HTTP client built against the
direct-call surface, which is exactly why it needs an ADR — but the
direct-call surface has not been implemented or published yet (the
alknet-http crate is specced, not shipped), so the "break" is
paper-only: no external client depends on it.
## Decision
### 1. The gateway is the sole invoke path; the direct-call surface is removed
The `HttpAdapter`'s router serves the **5 fixed gateway endpoints**
(`/search`, `/schema`, `/call`, `/batch`, `/subscribe` — ADR-042) as
the only way to invoke operations over HTTP. There is no
`POST /{service}/{op}` direct-call surface. An HTTP client invokes an
operation by `POST /call` with
`{ "operation": "/{service}/{op}", "input": {...} }`.
The router's operation-invoke surface is the gateway's `/call`
endpoint, not a per-operation path set. The operation name is in the
request body, not the HTTP path — same shape as MCP's `tools/call`
(ADR-041) and the call protocol's own `call.requested`
(`operationId` + `input`).
### 2. The HTTP method semantics move to the gateway endpoints
ADR-036's `OperationType` → HTTP method mapping (`Query``GET`,
`Mutation``POST`, `Subscription``SSE`) no longer applies per-operation
at the HTTP path level, because there are no per-operation HTTP paths.
The gateway endpoints have fixed methods (ADR-042's table):
`/search` `GET`, `/schema` `GET`, `/call` `POST`, `/batch` `POST`,
`/subscribe` `GET` (SSE). The `OperationType` of the *called operation*
is carried in the request/result, not expressed in the HTTP verb — the
client calls `/call` with the operation name; the operation's type is
the registry's concern, not the HTTP method's. A `Query` operation and a
`Mutation` operation both go through `POST /call`; the distinction is
in the operation spec (discovered via `/schema`), not the HTTP surface.
### 3. What survives from ADR-036
ADR-036's routing decision is superseded, but its other clauses are
independent of routing and remain in force:
- **SSE projection for subscriptions over `h2`/`http/1.1`** (§Streaming
projection). The gateway's `/subscribe` endpoint uses this SSE
projection (ADR-042 §2). The framing (`call.responded` → SSE `data:`
frame, `call.completed` → stream close, `call.aborted` → error frame)
is unchanged; it is now the `/subscribe` endpoint's behavior, not a
per-operation SSE stream.
- **Bearer auth** (§Auth). `Authorization: Bearer`
`resolve_from_token` on every gateway endpoint. Unchanged.
- **`/healthz`** (§`/healthz` and operational endpoints). Raw route, no
auth, no call protocol. Unchanged.
- **Stealth decoy** (§Stealth mode). Unknown paths get the decoy.
Unchanged — and now *all* operation invocations go through the 5
gateway paths, so the "unknown path" surface is larger (anything not
`/search`, `/schema`, `/call`, `/batch`, `/subscribe`, `/healthz`,
`/openapi.json`, the MCP route, or a custom route per ADR-046 is
decoy).
- **Error mapping** (the call `code` → HTTP status table in
http-server.md, ADR-023). The gateway's `/call` endpoint returns the
same error mapping. Unchanged in mechanism; the entry point is
`/call` instead of `/{service}/{op}`.
- **`External`-only dispatch** (Assumption 2). The gateway's `/call`
returns `404` (`NOT_FOUND`) for `Internal` operations, same as the
direct-call surface did. The `AccessControl` check runs on the called
operation regardless of the entry point.
- **Abort cascade on HTTP disconnect** (Consequences, citing ADR-016).
An HTTP client disconnecting mid-`/subscribe` is detected as a stream
close and sends `call.aborted`, cascading to descendants. Unchanged.
### 4. A deployment that wants per-operation HTTP paths builds them as custom routes (ADR-046)
A deployment that genuinely needs a REST-like per-operation HTTP
surface (to match a legacy API shape, to serve clients that can't
adapt to the gateway) builds it as a **custom route projection**
(ADR-046): the assembly layer injects an `axum::Router` with
`POST /{service}/{op}` handlers that dispatch into
`OperationRegistry::invoke()`. This is deployment-owned, additive, and
explicitly *not* the alknet default contract — the same status as an
OAI-compatible proxy. The direct-call surface is no longer a built-in
default; it's a projection a deployment can build if it needs it, on
the same extension point as any other custom HTTP surface.
This keeps the default surface small (5 gateway endpoints) while
preserving the *capability* for REST-like access — it just isn't free
by default, which is correct, because the per-operation path surface
has real costs (the static-surface problem) that the gateway avoids.
### 5. `to_openapi` describes the gateway, unchanged
`to_openapi` (ADR-042, ADR-045) already describes the 5 gateway
endpoints, not per-operation paths. Removing the direct-call surface
does not change what `to_openapi` generates — it already generated the
gateway doc. The `info.version` semver (ADR-045) tracks the gateway
contract; the direct-call surface was never in that contract. No change
to `to_openapi` or its versioning.
## Consequences
**Positive:**
- One invoke path over HTTP, not two. The HTTP surface is the 5 gateway
endpoints — exactly the "few core endpoints" of the simplified
contract. The contradiction with the gateway pattern is resolved.
- The per-caller API surface is the default, structurally. An HTTP
client cannot stub its toe on `POST /admin/deleteUser` because that
path does not exist; it calls `/call` with the operation name, and
`/search` tells it what it can call. The Gitea failure mode is
structurally impossible at the HTTP level, not just at the discovery
level.
- The HTTP surface is honest about what the call protocol is: an RPC,
not a REST API. The gateway's `/call` with `{ operation, input }` is
the call protocol's own shape; the direct path mapping was a REST
disguise that didn't fit (the flat JSON input, no path/query/body
split — ADR-042 §"The flat→structured problem").
- A deployment that wants REST-like per-operation paths still can, via
custom routes (ADR-046) — it's an explicit choice with its own costs,
not a default that leaks the static-surface problem into every
deployment.
- No change to `to_openapi` (already described the gateway), to the
SSE projection (now on `/subscribe`), to Bearer auth, to `/healthz`,
to stealth, or to error mapping. The cleanup is narrow: the routing
decision only.
**Negative:**
- An HTTP client that knew an operation name can no longer call it at
a predictable HTTP path. It must call `/call` with the operation name
in the body. This is one layer of indirection, but it's the same
indirection MCP uses and the same shape the call protocol uses
natively. The operation name (OQ-13's `/{service}/{op}` form) is
unchanged — it moves from the path to the body.
- The HTTP surface is RPC-shaped, not REST-shaped. A developer
expecting `POST /fs/readFile` sees `POST /call` with a body instead.
This is honest (the call protocol is a flat JSON RPC, ADR-042 §3), but
it's a departure from the REST conventions ADR-036's direct-call
surface offered. A deployment that needs the REST shape builds it as a
custom route projection (ADR-046).
- The `OperationType` → HTTP method mapping (`Query``GET` etc.) no
longer applies at the HTTP level. A `Query` operation and a
`Mutation` operation both go through `POST /call`. The distinction is
in the operation spec (visible via `/schema`), not the HTTP verb. This
loses a small amount of HTTP-level signal (a load balancer can't tell
a read from a write by method), but the call protocol's
`OperationType` was always a registry concern, not an HTTP concern —
the direct-call surface borrowed HTTP verbs to express it, and the
gateway doesn't.
## Assumptions
1. **No external client depends on the direct-call surface.** The
alknet-http crate is specced, not shipped; the direct-call surface
has not been published. Removing it is a paper-only break — no
deployed client breaks. This is why the cleanup is cheap now and
would be expensive after implementation.
2. **The gateway's `/call` is a sufficient invoke path for HTTP
clients.** Any operation callable via `POST /{service}/{op}` is
callable via `POST /call` with the operation name in the body. The
operation name form (`/{service}/{op}`, OQ-13) is unchanged. The
input/output shapes are unchanged. The only difference is where the
operation name lives (path vs body).
3. **A deployment needing REST-like per-operation paths builds them
explicitly.** Via ADR-046 custom routes. This is not a common need —
the gateway's `/call` covers the standard invoke case, and the
OAI-compatible-proxy pattern (ADR-046) covers the "match an external
API shape" case. The direct-call surface was a default that served
neither case particularly well (it wasn't REST-conventional, per
ADR-036 §Negative, and it leaked the static-surface problem).
4. **The gateway endpoints are stable (ADR-042 Assumption 1).**
Removing the direct-call surface does not change the gateway
endpoint set; the 5 endpoints are the published contract. This ADR
narrows the HTTP surface *to* that contract, it does not modify the
contract itself.
## References
- [ADR-036](036-http-to-call-operation-mapping.md) — the ADR whose
routing decision this supersedes (§Decision, §HTTP method semantics);
its other clauses survive (§"What survives from ADR-036")
- [ADR-042](042-openapi-gateway-pattern.md) — the gateway pattern that
made the direct-call surface redundant; its amendment to ADR-036
preserved the direct-call surface, which this ADR reverses
- [ADR-044](044-defer-webtransport-browsers-use-websocket.md) —
WebSocket is the browser bidirectional path (the direct-call surface
was the `h2`/`http/1.1` one-directional path; removing it does not
affect WebSocket, which carries the call protocol natively)
- [ADR-046](046-assembly-layer-custom-http-routes.md) — the extension
point a deployment uses to build a per-operation HTTP surface if it
needs one (the direct-call surface's replacement for the rare case)
- [ADR-045](045-to-openapi-gateway-spec-versioning.md) — `to_openapi`
versions the gateway contract (unchanged; the direct-call surface
was never in the contract)
- OQ-13 (resolved) — operation path format `/{service}/{op}` is
unchanged; it moves from the HTTP path to the `/call` request body
- `crates/http/http-server.md` — the spec whose router surface this ADR
narrows to the gateway endpoints