From 85ff26ae51e1045207da9ef7ae4fc8f963e2a913 Mon Sep 17 00:00:00 2001
From: "glm-5.1" <glm-5.1@alk.dev>
Date: Wed, 27 May 2026 05:22:49 +0000
Subject: [PATCH] ADR-018 amendment: meta-tools replace per-operation tool
 conversion

Instead of flattening the entire operation registry into individual
OpenAI tool definitions (with schema normalization headaches), the
AgentLoop provides the same 4 discovery+call meta-tools that the MCP
server already exposes: hub.list, hub.search, hub.schema, hub.call.

This eliminates operationToOpenAITool() + normalizeSchemaForOpenAI()
entirely. The LLM discovers and calls operations the same way an MCP
client does. Same tool defs in context always (4, not N), same call
protocol path (callMap.call()), unified interface for both MCP and
direct agent consumers.
---
 .../ADR-018-no-ai-sdk-direct-openai-proxy.md  | 109 +++++++++---------
 1 file changed, 57 insertions(+), 52 deletions(-)

diff --git a/docs/decisions/ADR-018-no-ai-sdk-direct-openai-proxy.md b/docs/decisions/ADR-018-no-ai-sdk-direct-openai-proxy.md
index ab4235f..e4d8fe4 100644
--- a/docs/decisions/ADR-018-no-ai-sdk-direct-openai-proxy.md
+++ b/docs/decisions/ADR-018-no-ai-sdk-direct-openai-proxy.md
@@ -2,6 +2,7 @@
 
 - **Status**: Accepted
 - **Date**: 2026-05-26
+- **Amended**: 2026-05-27 — meta-tools replace per-operation tool conversion
 - **Deciders**: alkdev
 
 ## Context
@@ -38,11 +39,12 @@ The supply chain risk assessment ([ai-sdk-supply-chain-risk.md](../research/ai-s
 
 ### What we actually need from the AI SDK
 
-The AI SDK provides three things the hub's architecture references:
+The AI SDK provides two things the hub's architecture references:
 
 1. **`UIMessage` format** — role + parts array for session messages
 2. **`streamText()`/`generateText()`** — LLM calling with streaming, tool execution, and multi-step agent loops
-3. **`tool()` + `operationToTool()`** — bridging the operations registry to AI SDK tool definitions
+
+The tool bridging problem (`tool()` + `operationToTool()`) is already solved by the MCP meta-tools model (`hub.list`/`hub.search`/`hub.schema`/`hub.call`) — the same 4 discovery+call tools that MCP clients use. These become the OpenAI tool definitions for direct agents. No per-operation schema conversion needed.
 
 The proxy is already architecturally committed — `agent-sessions.md` describes `/v1/chat/completions` as a Hono HTTP endpoint. The question is whether we call OpenAI-compatible APIs through the AI SDK or directly through the `openai` npm package.
 
@@ -52,8 +54,9 @@ After ADR-015 removed the opencode integration, the AI SDK's role narrowed signi
 
 The only place the AI SDK was used was for "direct agents" running in the hub process. These agents:
 - Read messages from Postgres
-- Convert operations to tools
+- Provide 4 meta-tools (hub.list/search/schema/call) as OpenAI tool definitions
 - Call an LLM via `streamText()` (which handles multi-step tool execution internally)
+- Tool calls route through the call protocol (same path as MCP clients), not per-operation adapters
 - Persist the response parts back to Postgres
 
 This is a bounded loop that the hub can implement directly, without the AI SDK's multi-provider abstraction, React hooks, or streaming protocol layers.
@@ -66,7 +69,7 @@ Remove the Vercel AI SDK as a dependency. The hub will:
 
 2. **Use the `openai` npm package directly** for LLM calls. Zero runtime dependencies, well-maintained, auto-generated from OpenAPI spec, compatible with Deno via npm specifiers.
 
-3. **Map operations to OpenAI tool calling format directly** — no `tool()` adapter needed. The operations registry already stores JSON Schema (via TypeBox). Converting `IOperationDefinition.inputSchema` to OpenAI's `{ type: "function", function: { name, description, parameters } }` format is a JSON Schema transform with normalization.
+3. **Use MCP meta-tools as OpenAI tool definitions** — instead of flattening the entire operation registry into individual OpenAI tool definitions (one per operation, with schema normalization headaches), the LLMs get the same 4 discovery+call tools that the MCP server already exposes: `hub.list`, `hub.search`, `hub.schema`, `hub.call`. The LLM discovers and calls operations the same way an MCP client does. This eliminates `operationToOpenAITool()` + `normalizeSchemaForOpenAI()` entirely. See [mcp-server.md](../architecture/mcp-server.md) for the tool definitions and the rationale for this pattern (N operations ≠ N tool defs in context).
 
 4. **Implement hub-own streaming** for the proxy's SSE output. The proxy receives OpenAI SSE chunks and transforms them into the hub's stream format — a subset of the AI SDK's `UIMessageChunk` protocol that covers the part types the hub uses.
 
@@ -86,7 +89,9 @@ Dev Spoke → HTTP POST → Hub Proxy → Provider
 ```
 Direct Agent → AgentLoop → openai SDK → Hub Proxy → Provider
                          ↕
-                operationToOpenAITool() → registry.execute()
+                hub.call/list/search/schema (4 meta-tools, same as MCP)
+                ↕
+                callMap.call() (call protocol) → registry.execute()
 Dev Spoke → HTTP POST → Hub Proxy → Provider
 ```
 
@@ -102,14 +107,16 @@ The AI SDK's `streamText()` handles multi-step tool execution internally: detect
 ┌─────────────────────────────────────────────────────┐
 │  1. Load session messages from Postgres              │
 │  2. Convert to OpenAI chat message format            │
-│  3. Convert hub operations to OpenAI tool definitions │
+│  3. Provide 4 meta-tools as OpenAI tool definitions  │
+│     (hub.list, hub.search, hub.schema, hub.call)     │
 │  4. Call LLM (via openai SDK, streaming)              │
 │  5. Emit stream events to client (SSE)               │
 │  6. Accumulate response                              │
 │  7. If response contains tool_calls:                  │
 │     a. Emit step-finish event                        │
 │     b. For each tool_call:                           │
-│        - Execute via registry.execute()               │
+│        - Route through callMap.call() (call protocol) │
+│        - This gives call graph tracking, abort, etc.  │
 │        - Emit tool-output-available event             │
 │     c. Append tool results to messages               │
 │     d. Emit step-start event                          │
@@ -129,7 +136,7 @@ The AI SDK's `streamText()` handles multi-step tool execution internally: detect
 
 **Usage tracking**: The `stream_options: { include_usage: true }` parameter is sent with each LLM call. The final step's usage data (prompt tokens, completion tokens) is accumulated across all steps and included in the `finish` event. The hub's `clients` type `llm-provider` stores cost metadata; the session's `data` column records total usage per turn.
 
-**Concurrent tool calls**: OpenAI responses can include multiple tool calls in a single response. The hub executes all tool calls in a step concurrently (via `Promise.all`), collects results, then continues the loop. All tool results are appended to messages before the next LLM call.
+**Concurrent tool calls**: OpenAI responses can include multiple tool calls in a single response. The hub executes all tool calls in a step concurrently (via `Promise.all` over `callMap.call()` invocations), collects results, then continues the loop. All tool results are appended to messages before the next LLM call. The LLM can also batch independent calls in a single `hub.call` invocation (since `hub.call` accepts an array), which is more token-efficient.
 
 ### `UIMessage` type ownership
 
@@ -174,41 +181,38 @@ This is a **starting subset** of the AI SDK's part types (which includes `source
 
 **Note on `metadata`**: The `metadata` field is typed as a structured object (not `unknown`) because the hub always populates it with model, provider, usage, and finish reason data from the LLM response. The `[key: string]: unknown` index signature allows extensibility without losing type safety for the known fields.
 
-### Operation → OpenAI tool mapping
+### Meta-tools: same interface as MCP server
 
-```ts
-function operationToOpenAITool(spec: IOperationDefinition): OpenAI.FunctionDefinition {
-  const schema = normalizeSchemaForOpenAI(spec.inputSchema);
-  return {
-    type: "function",
-    function: {
-      name: `${spec.namespace}.${spec.name}`,
-      description: spec.description,
-      parameters: schema,
-      strict: true,  // enable structured outputs when the operation schema supports it
-    },
-  };
-}
+Instead of flattening the entire operation registry into individual OpenAI tool definitions (one per operation, with all the schema normalization that requires), the AgentLoop provides the LLM with the same **4 discovery+call meta-tools** that the MCP server already exposes (see [mcp-server.md](../architecture/mcp-server.md)):
 
-/**
- * TypeBox produces JSON Schema, but OpenAI function calling has specific requirements:
- * - Top-level must be object type with properties
- * - additionalProperties: false at top level (required for strict mode)
- * - nested $ref needs resolution (TypeBox typically produces inline schemas)
- * - patternProperties, oneOf/anyOf with complex merging may not translate
- * This function normalizes TypeBox output for OpenAI compatibility.
- */
-function normalizeSchemaForOpenAI(schema: Record<string, unknown>): Record<string, unknown> {
-  // ~30-50 lines of normalization:
-  // 1. Ensure top-level type: "object"
-  // 2. Set additionalProperties: false for strict mode
-  // 3. Strip unsupported keywords (patternProperties, etc.)
-  // 4. Resolve $ref if present (unusual for TypeBox, but defensive)
-  // ...
-}
+| Tool | Input | Output | Description |
+|------|-------|--------|-------------|
+| `hub.list` | `{ namespace?: string }` | `OperationSpec[]` | List available operations, optionally filtered by namespace |
+| `hub.search` | `{ q?: string, namespace?: string }` | `{ tool, description }[]` | Search operations by query string and/or namespace |
+| `hub.schema` | `{ tool: string }` | `{ inputSchema, outputSchema }` | Get TypeBox schemas for a specific operation |
+| `hub.call` | `{ calls: [{ tool, input? }] }` | `{ success, result/error }[]` | Execute operations via call protocol (supports batch) |
+
+**Why this matters for OpenAI integration**: Each of these 4 tools has a small, stable schema. The LLM's context always contains just 4 tool definitions, not N. The LLM discovers what it needs (`search`, `schema`), then calls it. This is the same pattern that works for MCP clients, and it works identically for OpenAI tool-calling agents. No schema normalization is needed — the 4 schemas are hand-defined and fully under the hub's control.
+
+**`hub.call` routes through `callMap.call()`** (the call protocol), not `registry.execute()` directly. This gives full call graph tracking, abort cascading, and structured error handling — the same for both MCP clients and direct agents.
+
+**Batch calls**: `hub.call` accepts an array of `{ tool, input }` pairs and returns an array of results. This replaces the previous "batch by default" concept with a single, explicit tool. The LLM can batch multiple independent calls in a single tool invocation, which is more token-efficient than making N separate calls.
+
+**Agent workflow** (same as MCP workflow from mcp-server.md):
+```
+Agent: "I need to spawn a worktree for the auth feature"
+  → hub.search({ q: "spawn" })          → [{ tool: "coord.spawn", description: "..." }]
+  → hub.schema({ tool: "coord.spawn" }) → { inputSchema: { sessionId, task, branch, ... }, ... }
+  → hub.call({ calls: [{ tool: "coord.spawn", input: { ... } }] })
+
+Agent: "Let me also check status and send a message"
+  → hub.call({ calls: [
+      { tool: "coord.status", input: { parentSessionId: "..." } },
+      { tool: "coord.message", input: { sessionId: "...", body: "..." } }
+    ] })
 ```
 
-No adapter layer, no `tool()` wrapper, no AI SDK dependency. The operations registry already stores JSON Schema via TypeBox. The normalization step is necessary because OpenAI's function calling API has stricter JSON Schema requirements than TypeBox's default output.
+**No `operationToOpenAITool()` or `normalizeSchemaForOpenAI()` needed**: The 4 meta-tool schemas are hand-defined, small, and don't change when operations are added or modified. The previous approach of converting each `IOperationDefinition.inputSchema` (TypeBox → JSON Schema) to OpenAI's function calling format required normalization (OpenAI requires `additionalProperties: false`, top-level `type: "object"`, no `$ref`, no `patternProperties`). That entire problem disappears.
 
 ### Streaming format for the proxy
 
@@ -269,7 +273,7 @@ Per project convention (AGENTS.md: "Pin dependency versions in deno.json — upd
 | Document | Change | Status |
 |----------|--------|--------|
 | `AGENTS.md` | Remove AI SDK from External Dependencies and Constraints. Add `openai` with pinned version. Update `src/inference/` description. | ✅ Done |
-| `docs/architecture/agent-sessions.md` | Remove `streamText`/`generateText`/`proxyProvider`/`operationToTool` references. Replace with `AgentLoop` using `openai` SDK and `operationToOpenAITool` mapping. Update session data shapes. | Pending |
+| `docs/architecture/agent-sessions.md` | Remove `streamText`/`generateText`/`proxyProvider`/`operationToTool` references. Replace with `AgentLoop` using `openai` SDK and 4 meta-tools (hub.list/search/schema/call). Document unified MCP/direct agent tool interface. Update session data shapes. | Pending |
 | `docs/architecture/open-questions.md` | Add OQ-63, OQ-64, OQ-65. Add Theme 11. Add ADR-018 to resolved table. Add inference chain to cross-cutting dependencies. | ✅ Done |
 | `docs/architecture/packages.md` | Replace "Agent sessions (AI SDK)" with "Agent sessions (openai SDK + AgentLoop)" or similar. | Pending |
 
@@ -280,38 +284,38 @@ Per project convention (AGENTS.md: "Pin dependency versions in deno.json — upd
 1. **Reduced supply chain attack surface**: Zero transitive dependencies from the LLM calling path. The `openai` package has zero runtime dependencies and is auto-generated from OpenAPI spec.
 2. **No AI SDK release cadence coupling**: We update the `openai` package on our schedule, not at 2-5 releases/day.
 3. **Reduced bundle size**: The AI SDK core (`ai`) is ~50 kB minified, `@ai-sdk/provider` adds ~19.5 kB, plus `@ai-sdk/provider-utils` and transitive deps. The `openai` package is ~129.5 kB but with zero transitive deps — total install footprint is significantly smaller than `ai` + its dependency tree. More importantly, the hub's own streaming code (~300 LOC for the SSE transformer + AgentLoop) is a fraction of the AI SDK's ~2700 lines of `streamText()` alone, and we only ship what we use.
-3. **Hub-own streaming protocol**: We define and evolve the SSE chunk types we need without waiting for AI SDK releases. New part types or chunk types can be added immediately.
-4. **Simpler code paths**: No `proxyProvider()` factory, no `operationToTool()` adapter, no `LanguageModelV3` interface implementation. Direct `openai` SDK calls + JSON Schema tool definitions + explicit `AgentLoop`.
-5. **Consistent with existing patterns**: The operations registry already uses TypeBox → JSON Schema. Mapping operations to OpenAI tool format is a JSON Schema transform, not an adapter to a third-party type system.
-6. **Consistent with ADR-015 and ADR-016**: We've removed opencode's influence on the hub's data model. Removing the AI SDK continues this pattern — the hub owns its types, its streaming protocol, and its tool calling format.
-7. **Explicit agent loop**: The `AgentLoop` is hub code that we can debug, extend, and add observability to. Multi-step tool execution, max steps, error recovery, and usage tracking are all visible and modifiable. The AI SDK's `streamText()` hides this loop inside ~2700 lines of framework code.
+4. **Hub-own streaming protocol**: We define and evolve the SSE chunk types we need without waiting for AI SDK releases. New part types or chunk types can be added immediately.
+5. **Unified tool interface for MCP and direct agents**: The same 4 discovery+call meta-tools (`hub.list`/`hub.search`/`hub.schema`/`hub.call`) serve both MCP clients and OpenAI tool-calling agents. No per-operation schema conversion, no `normalizeSchemaForOpenAI()`, no adapter layers. The LLM discovers operations the same way an MCP client does — search, get schema, call.
+6. **Consistent with existing patterns**: The operations registry and call protocol are unchanged. Direct agents route tool calls through `callMap.call()` — the same path as MCP clients and spoke calls.
+7. **Consistent with ADR-015 and ADR-016**: We've removed opencode's influence on the hub's data model. Removing the AI SDK continues this pattern — the hub owns its types, its streaming protocol, and its tool calling format.
+8. **Explicit agent loop**: The `AgentLoop` is hub code that we can debug, extend, and add observability to. Multi-step tool execution, max steps, error recovery, and usage tracking are all visible and modifiable. The AI SDK's `streamText()` hides this loop inside ~2700 lines of framework code.
 
 ### Negative
 
 1. **More code to maintain**: The `AgentLoop`, streaming state machine, and tool execution orchestration are additional hub code. However, this code is bounded (~900 lines total), well-understood (LLM → tool call → execute → feed result → repeat), and has clear input/output formats. The AI SDK's equivalent is ~2700 lines of `streamText()` + the provider abstraction + the tool framework.
 2. **No multi-provider abstraction**: The AI SDK lets you swap providers with one line (`anthropic(...)` → `openai(...)`). With the `openai` SDK, we're locked to OpenAI-compatible APIs. But the hub's proxy already abstracts this — all LLM calls go through `/v1/chat/completions`, and the proxy routes to providers. Adding a new provider means adding a route in the proxy, not swapping AI SDK providers. For providers that don't support OpenAI-compatible APIs (e.g., Anthropic native), the proxy translates the format.
 3. **No AI SDK React hooks**: We can't use `useChat` or `useCompletion` on the frontend. The hub doesn't have a React frontend — it has an API server. Frontend concerns are out of scope.
-4. **Tool calling type safety**: The AI SDK's `tool()` function provides Zod-based type safety for tool input/output. We lose that. But our operations registry already provides TypeBox-based type safety — we're mapping TypeBox schemas to OpenAI's `parameters` field, which is JSON Schema (which TypeBox produces natively).
+4. **No per-operation type safety in tool calls**: The AI SDK's `tool()` function provides Zod-based type safety for each tool's input/output. With meta-tools, `hub.call` accepts `{ tool: string, input?: unknown }` — the LLM gets the schema via `hub.schema` and constructs the input. The call protocol validates the input against the operation's TypeBox schema at execution time, so invalid inputs are caught, but the LLM doesn't get compile-time type checking for individual operation calls. This is the same trade-off MCP clients accept — it's inherent to the discovery+call model.
 
 ### Implementation scope
 
 | Component | Estimated effort | Notes |
 |-----------|-----------------|-------|
 | `UIMessage` + `UIPart` + `ToolCallState` type definitions | Small (~60 lines) | Plain TypeScript interfaces |
-| `operationToOpenAITool()` + schema normalization | Small-Medium (~80 lines) | JSON Schema normalization for OpenAI strict mode (~30-50 lines) + mapping |
+| 4 meta-tool OpenAI definitions + schema | Small (~40 lines) | `hub.list`/`hub.search`/`hub.schema`/`hub.call` — same schemas as MCP server, wrapped as OpenAI function definitions |
 | OpenAI proxy SSE handler (Hono) | Medium (~250 lines) | Transform OpenAI SSE → hub chunk format, includes step boundary events |
-| `AgentLoop` — multi-step tool execution loop | Medium (~300 lines) | Step management, tool call detection, tool execution via registry, result feeding, max steps, usage accumulation |
+| `AgentLoop` — multi-step tool execution loop | Medium (~250 lines) | Step management, tool call detection, routing via callMap.call(), result feeding, max steps, usage accumulation. Simpler than before — tool calls all go through the same call protocol path. |
 | Direct agent stream consumer | Small (~80 lines) | Consume `openai` SDK streaming response, emit hub chunk events |
 | Part persistence from stream | Medium-Large (~250 lines) | Map stream chunks to `parts` table inserts/updates, buffered write strategy (flush on `*-end` events), state transitions |
 | Proxy key routing | Small (~50 lines) | Resolve `clients` + `client_secrets` for provider keys |
 | Error handling + retry logic | Small-Medium (~80 lines) | Exponential backoff for 429/5xx, non-retryable error mapping |
 
-**Total: ~1100 lines** of focused, well-bounded code with clear input/output formats.
+**Total: ~1060 lines** of focused, well-bounded code with clear input/output formats.
 
 The `AgentLoop` is the most significant component. Its contract is simple:
-- **Input**: messages + tool definitions + model config
+- **Input**: messages + 4 meta-tool definitions + model config
 - **Output**: SSE stream of hub chunk events + final UIMessage + usage data
-- **Loop**: call → accumulate → detect tools → execute → feed → repeat
+- **Loop**: call → accumulate → detect tools → route through callMap.call() → feed → repeat
 
 The AI SDK's `streamText()` handles this loop in ~2700 lines (including provider abstraction, middleware hooks, multi-model smoothing, and edge cases we don't need). Our `AgentLoop` handles exactly our use case in ~300 lines.
 
@@ -320,7 +324,7 @@ The AI SDK's `streamText()` handles this loop in ~2700 lines (including provider
 | OQ | Impact |
 |----|--------|
 | OQ-16 | **Simplified**: ADR-016 resolved this — hub owns its schema. This ADR extends that to TypeScript types. The hub defines `UIMessage`, `UIPart`, and `ToolCallState` types. |
-| Agent sessions architecture (`agent-sessions.md`) | **Needs update**: Remove `streamText`/`generateText`/`proxyProvider`/`operationToTool` references. Replace with `AgentLoop` using `openai` SDK and `operationToOpenAITool` mapping. Document the two streaming paths producing the same output format. |
+| Agent sessions architecture (`agent-sessions.md`) | **Needs update**: Remove `streamText`/`generateText`/`proxyProvider`/`operationToTool` references. Replace with `AgentLoop` using `openai` SDK and 4 meta-tools. Document the unified MCP/direct agent tool interface. |
 | `AGENTS.md` Constraints and Dependencies | **Needs update**: Remove AI SDK from dependencies and constraints. Add `openai` package with pinned version. Update `src/inference/` description. |
 
 ### Open questions created
@@ -335,6 +339,7 @@ The AI SDK's `streamText()` handles this loop in ~2700 lines (including provider
 
 - [ADR-015: Dev spoke instead of opencode integration](ADR-015-dev-spoke-not-opencode.md) — removed opencode dependency
 - [ADR-016: Hub-own schema](ADR-016-hub-own-schema.md) — hub owns session/message/part schema
+- [MCP server: Discovery + Call Interface](../architecture/mcp-server.md) — the 4 meta-tools model that now serves both MCP and direct agents
 - [AI SDK supply chain risk assessment](../research/ai-sdk-supply-chain-risk.md) — detailed analysis of AI SDK risks
 - [agent-sessions.md](../architecture/agent-sessions.md) — current session architecture (references AI SDK)
 - [OpenAI Node SDK](https://github.com/openai/openai-node) — zero-dependency, auto-generated from OpenAPI spec
\ No newline at end of file