diff --git a/docs/decisions/ADR-018-no-ai-sdk-direct-openai-proxy.md b/docs/decisions/ADR-018-no-ai-sdk-direct-openai-proxy.md index ab4235f..e4d8fe4 100644 --- a/docs/decisions/ADR-018-no-ai-sdk-direct-openai-proxy.md +++ b/docs/decisions/ADR-018-no-ai-sdk-direct-openai-proxy.md @@ -2,6 +2,7 @@ - **Status**: Accepted - **Date**: 2026-05-26 +- **Amended**: 2026-05-27 — meta-tools replace per-operation tool conversion - **Deciders**: alkdev ## Context @@ -38,11 +39,12 @@ The supply chain risk assessment ([ai-sdk-supply-chain-risk.md](../research/ai-s ### What we actually need from the AI SDK -The AI SDK provides three things the hub's architecture references: +The AI SDK provides two things the hub's architecture references: 1. **`UIMessage` format** — role + parts array for session messages 2. **`streamText()`/`generateText()`** — LLM calling with streaming, tool execution, and multi-step agent loops -3. **`tool()` + `operationToTool()`** — bridging the operations registry to AI SDK tool definitions + +The tool bridging problem (`tool()` + `operationToTool()`) is already solved by the MCP meta-tools model (`hub.list`/`hub.search`/`hub.schema`/`hub.call`) — the same 4 discovery+call tools that MCP clients use. These become the OpenAI tool definitions for direct agents. No per-operation schema conversion needed. The proxy is already architecturally committed — `agent-sessions.md` describes `/v1/chat/completions` as a Hono HTTP endpoint. The question is whether we call OpenAI-compatible APIs through the AI SDK or directly through the `openai` npm package. @@ -52,8 +54,9 @@ After ADR-015 removed the opencode integration, the AI SDK's role narrowed signi The only place the AI SDK was used was for "direct agents" running in the hub process. These agents: - Read messages from Postgres -- Convert operations to tools +- Provide 4 meta-tools (hub.list/search/schema/call) as OpenAI tool definitions - Call an LLM via `streamText()` (which handles multi-step tool execution internally) +- Tool calls route through the call protocol (same path as MCP clients), not per-operation adapters - Persist the response parts back to Postgres This is a bounded loop that the hub can implement directly, without the AI SDK's multi-provider abstraction, React hooks, or streaming protocol layers. @@ -66,7 +69,7 @@ Remove the Vercel AI SDK as a dependency. The hub will: 2. **Use the `openai` npm package directly** for LLM calls. Zero runtime dependencies, well-maintained, auto-generated from OpenAPI spec, compatible with Deno via npm specifiers. -3. **Map operations to OpenAI tool calling format directly** — no `tool()` adapter needed. The operations registry already stores JSON Schema (via TypeBox). Converting `IOperationDefinition.inputSchema` to OpenAI's `{ type: "function", function: { name, description, parameters } }` format is a JSON Schema transform with normalization. +3. **Use MCP meta-tools as OpenAI tool definitions** — instead of flattening the entire operation registry into individual OpenAI tool definitions (one per operation, with schema normalization headaches), the LLMs get the same 4 discovery+call tools that the MCP server already exposes: `hub.list`, `hub.search`, `hub.schema`, `hub.call`. The LLM discovers and calls operations the same way an MCP client does. This eliminates `operationToOpenAITool()` + `normalizeSchemaForOpenAI()` entirely. See [mcp-server.md](../architecture/mcp-server.md) for the tool definitions and the rationale for this pattern (N operations ≠ N tool defs in context). 4. **Implement hub-own streaming** for the proxy's SSE output. The proxy receives OpenAI SSE chunks and transforms them into the hub's stream format — a subset of the AI SDK's `UIMessageChunk` protocol that covers the part types the hub uses. @@ -86,7 +89,9 @@ Dev Spoke → HTTP POST → Hub Proxy → Provider ``` Direct Agent → AgentLoop → openai SDK → Hub Proxy → Provider ↕ - operationToOpenAITool() → registry.execute() + hub.call/list/search/schema (4 meta-tools, same as MCP) + ↕ + callMap.call() (call protocol) → registry.execute() Dev Spoke → HTTP POST → Hub Proxy → Provider ``` @@ -102,14 +107,16 @@ The AI SDK's `streamText()` handles multi-step tool execution internally: detect ┌─────────────────────────────────────────────────────┐ │ 1. Load session messages from Postgres │ │ 2. Convert to OpenAI chat message format │ -│ 3. Convert hub operations to OpenAI tool definitions │ +│ 3. Provide 4 meta-tools as OpenAI tool definitions │ +│ (hub.list, hub.search, hub.schema, hub.call) │ │ 4. Call LLM (via openai SDK, streaming) │ │ 5. Emit stream events to client (SSE) │ │ 6. Accumulate response │ │ 7. If response contains tool_calls: │ │ a. Emit step-finish event │ │ b. For each tool_call: │ -│ - Execute via registry.execute() │ +│ - Route through callMap.call() (call protocol) │ +│ - This gives call graph tracking, abort, etc. │ │ - Emit tool-output-available event │ │ c. Append tool results to messages │ │ d. Emit step-start event │ @@ -129,7 +136,7 @@ The AI SDK's `streamText()` handles multi-step tool execution internally: detect **Usage tracking**: The `stream_options: { include_usage: true }` parameter is sent with each LLM call. The final step's usage data (prompt tokens, completion tokens) is accumulated across all steps and included in the `finish` event. The hub's `clients` type `llm-provider` stores cost metadata; the session's `data` column records total usage per turn. -**Concurrent tool calls**: OpenAI responses can include multiple tool calls in a single response. The hub executes all tool calls in a step concurrently (via `Promise.all`), collects results, then continues the loop. All tool results are appended to messages before the next LLM call. +**Concurrent tool calls**: OpenAI responses can include multiple tool calls in a single response. The hub executes all tool calls in a step concurrently (via `Promise.all` over `callMap.call()` invocations), collects results, then continues the loop. All tool results are appended to messages before the next LLM call. The LLM can also batch independent calls in a single `hub.call` invocation (since `hub.call` accepts an array), which is more token-efficient. ### `UIMessage` type ownership @@ -174,41 +181,38 @@ This is a **starting subset** of the AI SDK's part types (which includes `source **Note on `metadata`**: The `metadata` field is typed as a structured object (not `unknown`) because the hub always populates it with model, provider, usage, and finish reason data from the LLM response. The `[key: string]: unknown` index signature allows extensibility without losing type safety for the known fields. -### Operation → OpenAI tool mapping +### Meta-tools: same interface as MCP server -```ts -function operationToOpenAITool(spec: IOperationDefinition): OpenAI.FunctionDefinition { - const schema = normalizeSchemaForOpenAI(spec.inputSchema); - return { - type: "function", - function: { - name: `${spec.namespace}.${spec.name}`, - description: spec.description, - parameters: schema, - strict: true, // enable structured outputs when the operation schema supports it - }, - }; -} +Instead of flattening the entire operation registry into individual OpenAI tool definitions (one per operation, with all the schema normalization that requires), the AgentLoop provides the LLM with the same **4 discovery+call meta-tools** that the MCP server already exposes (see [mcp-server.md](../architecture/mcp-server.md)): -/** - * TypeBox produces JSON Schema, but OpenAI function calling has specific requirements: - * - Top-level must be object type with properties - * - additionalProperties: false at top level (required for strict mode) - * - nested $ref needs resolution (TypeBox typically produces inline schemas) - * - patternProperties, oneOf/anyOf with complex merging may not translate - * This function normalizes TypeBox output for OpenAI compatibility. - */ -function normalizeSchemaForOpenAI(schema: Record): Record { - // ~30-50 lines of normalization: - // 1. Ensure top-level type: "object" - // 2. Set additionalProperties: false for strict mode - // 3. Strip unsupported keywords (patternProperties, etc.) - // 4. Resolve $ref if present (unusual for TypeBox, but defensive) - // ... -} +| Tool | Input | Output | Description | +|------|-------|--------|-------------| +| `hub.list` | `{ namespace?: string }` | `OperationSpec[]` | List available operations, optionally filtered by namespace | +| `hub.search` | `{ q?: string, namespace?: string }` | `{ tool, description }[]` | Search operations by query string and/or namespace | +| `hub.schema` | `{ tool: string }` | `{ inputSchema, outputSchema }` | Get TypeBox schemas for a specific operation | +| `hub.call` | `{ calls: [{ tool, input? }] }` | `{ success, result/error }[]` | Execute operations via call protocol (supports batch) | + +**Why this matters for OpenAI integration**: Each of these 4 tools has a small, stable schema. The LLM's context always contains just 4 tool definitions, not N. The LLM discovers what it needs (`search`, `schema`), then calls it. This is the same pattern that works for MCP clients, and it works identically for OpenAI tool-calling agents. No schema normalization is needed — the 4 schemas are hand-defined and fully under the hub's control. + +**`hub.call` routes through `callMap.call()`** (the call protocol), not `registry.execute()` directly. This gives full call graph tracking, abort cascading, and structured error handling — the same for both MCP clients and direct agents. + +**Batch calls**: `hub.call` accepts an array of `{ tool, input }` pairs and returns an array of results. This replaces the previous "batch by default" concept with a single, explicit tool. The LLM can batch multiple independent calls in a single tool invocation, which is more token-efficient than making N separate calls. + +**Agent workflow** (same as MCP workflow from mcp-server.md): +``` +Agent: "I need to spawn a worktree for the auth feature" + → hub.search({ q: "spawn" }) → [{ tool: "coord.spawn", description: "..." }] + → hub.schema({ tool: "coord.spawn" }) → { inputSchema: { sessionId, task, branch, ... }, ... } + → hub.call({ calls: [{ tool: "coord.spawn", input: { ... } }] }) + +Agent: "Let me also check status and send a message" + → hub.call({ calls: [ + { tool: "coord.status", input: { parentSessionId: "..." } }, + { tool: "coord.message", input: { sessionId: "...", body: "..." } } + ] }) ``` -No adapter layer, no `tool()` wrapper, no AI SDK dependency. The operations registry already stores JSON Schema via TypeBox. The normalization step is necessary because OpenAI's function calling API has stricter JSON Schema requirements than TypeBox's default output. +**No `operationToOpenAITool()` or `normalizeSchemaForOpenAI()` needed**: The 4 meta-tool schemas are hand-defined, small, and don't change when operations are added or modified. The previous approach of converting each `IOperationDefinition.inputSchema` (TypeBox → JSON Schema) to OpenAI's function calling format required normalization (OpenAI requires `additionalProperties: false`, top-level `type: "object"`, no `$ref`, no `patternProperties`). That entire problem disappears. ### Streaming format for the proxy @@ -269,7 +273,7 @@ Per project convention (AGENTS.md: "Pin dependency versions in deno.json — upd | Document | Change | Status | |----------|--------|--------| | `AGENTS.md` | Remove AI SDK from External Dependencies and Constraints. Add `openai` with pinned version. Update `src/inference/` description. | ✅ Done | -| `docs/architecture/agent-sessions.md` | Remove `streamText`/`generateText`/`proxyProvider`/`operationToTool` references. Replace with `AgentLoop` using `openai` SDK and `operationToOpenAITool` mapping. Update session data shapes. | Pending | +| `docs/architecture/agent-sessions.md` | Remove `streamText`/`generateText`/`proxyProvider`/`operationToTool` references. Replace with `AgentLoop` using `openai` SDK and 4 meta-tools (hub.list/search/schema/call). Document unified MCP/direct agent tool interface. Update session data shapes. | Pending | | `docs/architecture/open-questions.md` | Add OQ-63, OQ-64, OQ-65. Add Theme 11. Add ADR-018 to resolved table. Add inference chain to cross-cutting dependencies. | ✅ Done | | `docs/architecture/packages.md` | Replace "Agent sessions (AI SDK)" with "Agent sessions (openai SDK + AgentLoop)" or similar. | Pending | @@ -280,38 +284,38 @@ Per project convention (AGENTS.md: "Pin dependency versions in deno.json — upd 1. **Reduced supply chain attack surface**: Zero transitive dependencies from the LLM calling path. The `openai` package has zero runtime dependencies and is auto-generated from OpenAPI spec. 2. **No AI SDK release cadence coupling**: We update the `openai` package on our schedule, not at 2-5 releases/day. 3. **Reduced bundle size**: The AI SDK core (`ai`) is ~50 kB minified, `@ai-sdk/provider` adds ~19.5 kB, plus `@ai-sdk/provider-utils` and transitive deps. The `openai` package is ~129.5 kB but with zero transitive deps — total install footprint is significantly smaller than `ai` + its dependency tree. More importantly, the hub's own streaming code (~300 LOC for the SSE transformer + AgentLoop) is a fraction of the AI SDK's ~2700 lines of `streamText()` alone, and we only ship what we use. -3. **Hub-own streaming protocol**: We define and evolve the SSE chunk types we need without waiting for AI SDK releases. New part types or chunk types can be added immediately. -4. **Simpler code paths**: No `proxyProvider()` factory, no `operationToTool()` adapter, no `LanguageModelV3` interface implementation. Direct `openai` SDK calls + JSON Schema tool definitions + explicit `AgentLoop`. -5. **Consistent with existing patterns**: The operations registry already uses TypeBox → JSON Schema. Mapping operations to OpenAI tool format is a JSON Schema transform, not an adapter to a third-party type system. -6. **Consistent with ADR-015 and ADR-016**: We've removed opencode's influence on the hub's data model. Removing the AI SDK continues this pattern — the hub owns its types, its streaming protocol, and its tool calling format. -7. **Explicit agent loop**: The `AgentLoop` is hub code that we can debug, extend, and add observability to. Multi-step tool execution, max steps, error recovery, and usage tracking are all visible and modifiable. The AI SDK's `streamText()` hides this loop inside ~2700 lines of framework code. +4. **Hub-own streaming protocol**: We define and evolve the SSE chunk types we need without waiting for AI SDK releases. New part types or chunk types can be added immediately. +5. **Unified tool interface for MCP and direct agents**: The same 4 discovery+call meta-tools (`hub.list`/`hub.search`/`hub.schema`/`hub.call`) serve both MCP clients and OpenAI tool-calling agents. No per-operation schema conversion, no `normalizeSchemaForOpenAI()`, no adapter layers. The LLM discovers operations the same way an MCP client does — search, get schema, call. +6. **Consistent with existing patterns**: The operations registry and call protocol are unchanged. Direct agents route tool calls through `callMap.call()` — the same path as MCP clients and spoke calls. +7. **Consistent with ADR-015 and ADR-016**: We've removed opencode's influence on the hub's data model. Removing the AI SDK continues this pattern — the hub owns its types, its streaming protocol, and its tool calling format. +8. **Explicit agent loop**: The `AgentLoop` is hub code that we can debug, extend, and add observability to. Multi-step tool execution, max steps, error recovery, and usage tracking are all visible and modifiable. The AI SDK's `streamText()` hides this loop inside ~2700 lines of framework code. ### Negative 1. **More code to maintain**: The `AgentLoop`, streaming state machine, and tool execution orchestration are additional hub code. However, this code is bounded (~900 lines total), well-understood (LLM → tool call → execute → feed result → repeat), and has clear input/output formats. The AI SDK's equivalent is ~2700 lines of `streamText()` + the provider abstraction + the tool framework. 2. **No multi-provider abstraction**: The AI SDK lets you swap providers with one line (`anthropic(...)` → `openai(...)`). With the `openai` SDK, we're locked to OpenAI-compatible APIs. But the hub's proxy already abstracts this — all LLM calls go through `/v1/chat/completions`, and the proxy routes to providers. Adding a new provider means adding a route in the proxy, not swapping AI SDK providers. For providers that don't support OpenAI-compatible APIs (e.g., Anthropic native), the proxy translates the format. 3. **No AI SDK React hooks**: We can't use `useChat` or `useCompletion` on the frontend. The hub doesn't have a React frontend — it has an API server. Frontend concerns are out of scope. -4. **Tool calling type safety**: The AI SDK's `tool()` function provides Zod-based type safety for tool input/output. We lose that. But our operations registry already provides TypeBox-based type safety — we're mapping TypeBox schemas to OpenAI's `parameters` field, which is JSON Schema (which TypeBox produces natively). +4. **No per-operation type safety in tool calls**: The AI SDK's `tool()` function provides Zod-based type safety for each tool's input/output. With meta-tools, `hub.call` accepts `{ tool: string, input?: unknown }` — the LLM gets the schema via `hub.schema` and constructs the input. The call protocol validates the input against the operation's TypeBox schema at execution time, so invalid inputs are caught, but the LLM doesn't get compile-time type checking for individual operation calls. This is the same trade-off MCP clients accept — it's inherent to the discovery+call model. ### Implementation scope | Component | Estimated effort | Notes | |-----------|-----------------|-------| | `UIMessage` + `UIPart` + `ToolCallState` type definitions | Small (~60 lines) | Plain TypeScript interfaces | -| `operationToOpenAITool()` + schema normalization | Small-Medium (~80 lines) | JSON Schema normalization for OpenAI strict mode (~30-50 lines) + mapping | +| 4 meta-tool OpenAI definitions + schema | Small (~40 lines) | `hub.list`/`hub.search`/`hub.schema`/`hub.call` — same schemas as MCP server, wrapped as OpenAI function definitions | | OpenAI proxy SSE handler (Hono) | Medium (~250 lines) | Transform OpenAI SSE → hub chunk format, includes step boundary events | -| `AgentLoop` — multi-step tool execution loop | Medium (~300 lines) | Step management, tool call detection, tool execution via registry, result feeding, max steps, usage accumulation | +| `AgentLoop` — multi-step tool execution loop | Medium (~250 lines) | Step management, tool call detection, routing via callMap.call(), result feeding, max steps, usage accumulation. Simpler than before — tool calls all go through the same call protocol path. | | Direct agent stream consumer | Small (~80 lines) | Consume `openai` SDK streaming response, emit hub chunk events | | Part persistence from stream | Medium-Large (~250 lines) | Map stream chunks to `parts` table inserts/updates, buffered write strategy (flush on `*-end` events), state transitions | | Proxy key routing | Small (~50 lines) | Resolve `clients` + `client_secrets` for provider keys | | Error handling + retry logic | Small-Medium (~80 lines) | Exponential backoff for 429/5xx, non-retryable error mapping | -**Total: ~1100 lines** of focused, well-bounded code with clear input/output formats. +**Total: ~1060 lines** of focused, well-bounded code with clear input/output formats. The `AgentLoop` is the most significant component. Its contract is simple: -- **Input**: messages + tool definitions + model config +- **Input**: messages + 4 meta-tool definitions + model config - **Output**: SSE stream of hub chunk events + final UIMessage + usage data -- **Loop**: call → accumulate → detect tools → execute → feed → repeat +- **Loop**: call → accumulate → detect tools → route through callMap.call() → feed → repeat The AI SDK's `streamText()` handles this loop in ~2700 lines (including provider abstraction, middleware hooks, multi-model smoothing, and edge cases we don't need). Our `AgentLoop` handles exactly our use case in ~300 lines. @@ -320,7 +324,7 @@ The AI SDK's `streamText()` handles this loop in ~2700 lines (including provider | OQ | Impact | |----|--------| | OQ-16 | **Simplified**: ADR-016 resolved this — hub owns its schema. This ADR extends that to TypeScript types. The hub defines `UIMessage`, `UIPart`, and `ToolCallState` types. | -| Agent sessions architecture (`agent-sessions.md`) | **Needs update**: Remove `streamText`/`generateText`/`proxyProvider`/`operationToTool` references. Replace with `AgentLoop` using `openai` SDK and `operationToOpenAITool` mapping. Document the two streaming paths producing the same output format. | +| Agent sessions architecture (`agent-sessions.md`) | **Needs update**: Remove `streamText`/`generateText`/`proxyProvider`/`operationToTool` references. Replace with `AgentLoop` using `openai` SDK and 4 meta-tools. Document the unified MCP/direct agent tool interface. | | `AGENTS.md` Constraints and Dependencies | **Needs update**: Remove AI SDK from dependencies and constraints. Add `openai` package with pinned version. Update `src/inference/` description. | ### Open questions created @@ -335,6 +339,7 @@ The AI SDK's `streamText()` handles this loop in ~2700 lines (including provider - [ADR-015: Dev spoke instead of opencode integration](ADR-015-dev-spoke-not-opencode.md) — removed opencode dependency - [ADR-016: Hub-own schema](ADR-016-hub-own-schema.md) — hub owns session/message/part schema +- [MCP server: Discovery + Call Interface](../architecture/mcp-server.md) — the 4 meta-tools model that now serves both MCP and direct agents - [AI SDK supply chain risk assessment](../research/ai-sdk-supply-chain-risk.md) — detailed analysis of AI SDK risks - [agent-sessions.md](../architecture/agent-sessions.md) — current session architecture (references AI SDK) - [OpenAI Node SDK](https://github.com/openai/openai-node) — zero-dependency, auto-generated from OpenAPI spec \ No newline at end of file