ADR-018: Remove AI SDK, use openai SDK directly with hub-own streaming

Replace the Vercel AI SDK with direct OpenAI SDK calls and a custom AgentLoop. The AI SDK has zero runtime integration today, so removing it costs nothing. Supply chain risk (2-5 releases/day, April 2026 Vercel breach, bus factor of 1) makes it a liability we don't need. Key changes: - ADR-018 accepted: openai package (zero runtime deps) replaces ai SDK - AgentLoop handles multi-step tool execution explicitly (~300 LOC vs AI SDK's ~2700 LOC streamText) - Hub owns UIMessage/UIPart/ToolCallState types (extends ADR-016) - Hub owns streaming protocol (subset of AI SDK's UIMessageChunk wire format with step boundaries, error handling, usage tracking) - operationToOpenAITool() maps TypeBox schemas directly, no adapter - Trade-off: ~1100 LOC total new code for the savings of 6+ transitive deps, supply chain risk, and release cadence coupling Updates AGENTS.md constraints and dependencies, adds OQ-63/OQ-64/OQ-65 and Theme 11 (Inference & LLM Integration) to open questions.
2026-05-26 08:55:52 +00:00
parent 2d7f9c11cb
commit a248698f40
4 changed files with 634 additions and 3 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -18,7 +18,7 @@ src/
  auth/         — API key auth (keypal), session tokens
  coordination/ — coord.spawn/status/message/notify/abort/detect
  redis/        — Redis EventTarget setup, event routing
-  inference/    — OpenAI-compatible proxy, LLM key management
+  inference/    — OpenAI-compatible proxy, LLM key management, AgentLoop, hub-own streaming
 docs/
  architecture/ — Architecture specs (see overview.md for index)
  decisions/    — ADRs
@@ -43,6 +43,7 @@ migrations/     — Drizzle SQL migrations
 | `drizzle-orm` | 0.45.2 | Postgres ORM |
 | `ioredis` | 5.10.1 | Redis client |
 | `keypal` | 0.2.0 | API key management |
 | `openai` | TBD (pin on add) | OpenAI API client for LLM calls (zero runtime deps) |
 | `pg` | 8.21.0 | Postgres driver |
 | `@hono/mcp` | 0.3.0 | MCP server middleware |
 | `@logtape/logtape` | 2.1.1 | Structured logging |
@@ -79,7 +80,7 @@ deno lint
 - Deno runtime (latest stable), TypeScript strict mode
 - Postgres for all persistent state (Drizzle ORM)
 - Redis for cross-process events (ioredis)
- AI SDK (`ai` package) for LLM streaming (not Effect)
+- OpenAI SDK (`openai` package) for LLM calls — no AI SDK (see ADR-018)
 - TypeBox for all runtime schemas (`@alkdev/typebox`, not Zod or @sinclair/typebox)
 - Hono for HTTP server
 - WebSocket for hub<->spoke transport (not SSE as primary)
--- a/docs/architecture/open-questions.md
+++ b/docs/architecture/open-questions.md
@@ -28,6 +28,7 @@ Cross-cutting compilation of all unresolved questions across the hub architectur
 | 8. Deployment & Operations | OQ-34–OQ-37 | Migrations, hot spare, observability, Redis topology |
 | 9. Cross-Cutting Implementation Gaps | OQ-38–OQ-50 | Startup, config, logger, Gitea, keypal, auth, schemas |
 | 10. Future / Low Priority | OQ-51–OQ-60 | Phase 3+, memory, versioning, visualization |
 | 11. Inference & LLM Integration | OQ-63–OQ-65 | Streaming protocol, SDK choice, part persistence |
 ### Resolved by ADRs
@@ -37,6 +38,7 @@ Cross-cutting compilation of all unresolved questions across the hub architectur
 | [ADR-015](../decisions/ADR-015-dev-spoke-not-opencode.md) | OQ-16, OQ-17, OQ-26, OQ-28, OQ-51, OQ-55 |
 | [ADR-016](../decisions/ADR-016-hub-own-schema.md) | OQ-18, OQ-19 (confirmed) |
 | [ADR-017](../decisions/ADR-017-hub-first-roles.md) | OQ-26, OQ-28, OQ-51 (overlaps with ADR-015) |
 | [ADR-018](../decisions/ADR-018-no-ai-sdk-direct-openai-proxy.md) | OQ-16 (extended to TypeScript types) |
 ---
@@ -537,6 +539,34 @@ Cross-cutting compilation of all unresolved questions across the hub architectur
 ---
 ## Theme 11: Inference & LLM Integration
 ### OQ-63: What is the exact subset of UIMessageChunk types the hub proxy emits?
 - **Origin**: [ADR-018](../decisions/ADR-018-no-ai-sdk-direct-openai-proxy.md)
 - **Status**: open
 - **Priority**: medium
 - **Question**: ADR-018 defines an initial subset of AI SDK's UIMessageChunk protocol for the hub's SSE streaming format. The initial set covers text, reasoning, tool call lifecycle, step boundaries, and error events. As features are added (e.g., source URLs, file attachments, dynamic tools), new chunk types need to be specified. Should the hub define a formal schema for its streaming protocol, or document it informally? How do we version the protocol if chunk types change?
 - **Cross-references**: OQ-64 (raw HTTP vs SDK streaming)
 ### OQ-64: Should the direct agent use the openai SDK's streaming API or raw HTTP?
 - **Origin**: [ADR-018](../decisions/ADR-018-no-ai-sdk-direct-openai-proxy.md)
 - **Status**: open
 - **Priority**: low
 - **Question**: The direct agent path can use the `openai` SDK's typed streaming API (`client.chat.completions.create({ stream: true })`) or raw HTTP for more control over SSE parsing. The SDK provides convenience (typed responses, automatic tool call accumulation) but adds abstraction. The proxy path must use raw HTTP (Hono SSE handler). Should both paths use the same approach for consistency, or is it acceptable to use the SDK for the direct agent and raw HTTP for the proxy?
 - **Cross-references**: OQ-63 (streaming protocol)
 ### OQ-65: What is the buffered write strategy for part persistence?
 - **Origin**: [ADR-018](../decisions/ADR-018-no-ai-sdk-direct-openai-proxy.md)
 - **Status**: open
 - **Priority**: medium
 - **Question**: Streaming LLM responses produce many part updates (text deltas, state transitions, tool call results). Writing each delta as a separate database write would be extremely expensive. Options: (a) flush on `*-end` events (per-part commits — text parts committed when done, tool parts committed when complete), (b) flush on `step-finish` (per-step commits — all parts in a step committed together), (c) flush on `finish` (per-message commits — all parts committed when the agent turn is complete). Per-part (a) balances latency and write volume best for real-time SSE updates.
 - **Cross-references**: OQ-63 (streaming protocol defines when `*-end` events fire)
 ---
 ## Cross-Cutting Dependencies
 These questions block each other or share resolution paths:
@@ -551,7 +581,7 @@ These questions block each other or share resolution paths:
 5. **Data Lifecycle Chain**: OQ-12 → OQ-13 → OQ-14 — Operation deletion strategy, call graph retention, and payload truncation interact. OQ-12 determines whether operations can be removed at all.
-6. **Dev Spoke Chain**: OQ-61 → OQ-62 → OQ-06 — Dev spoke operations and distribution need specification before spoke provisioning can be fully designed. OQ-11 is narrowed by ADR-015 but not resolved.
+7. **Inference Chain**: OQ-63 → OQ-64, OQ-65 — The streaming protocol subset (OQ-63) determines what the direct agent and proxy need to produce. The SDK vs. raw HTTP choice (OQ-64) and the persistence strategy (OQ-65) depend on the protocol definition.
 ---
@@ -621,6 +651,9 @@ These questions block each other or share resolution paths:
 | OQ-60 | Full ujsx call templates | call-graph | low | open |
 | OQ-61 | Dev spoke operations | ADR-015 | medium | open |
 | OQ-62 | Dev spoke distribution and config | ADR-015 | medium | open |
 | OQ-63 | Hub proxy SSE chunk type subset | ADR-018 | medium | open |
 | OQ-64 | Direct agent: openai SDK vs raw HTTP | ADR-018 | low | open |
 | OQ-65 | Part persistence buffered write strategy | ADR-018 | medium | open |
 ### High Priority Open Questions (Blocking)
--- a/docs/decisions/ADR-018-no-ai-sdk-direct-openai-proxy.md
+++ b/docs/decisions/ADR-018-no-ai-sdk-direct-openai-proxy.md
@@ -0,0 +1,340 @@
 # ADR-018: No AI SDK — direct OpenAI proxy with hub-own streaming
 - **Status**: Accepted
 - **Date**: 2026-05-26
 - **Deciders**: alkdev
 ## Context
 The hub was architected with the Vercel AI SDK (`ai` package + `@ai-sdk/*`) as a core dependency for LLM streaming. `agent-sessions.md` describes direct agents using `streamText()`/`generateText()` with `proxyProvider()` and `operationToTool()` bridging the operations registry to AI SDK tools. ADR-016 made AI SDK `UIMessage` the primary design constraint for the session/message/part schema.
 However, the AI SDK has **zero runtime integration today** — it appears only in architecture docs and `deno.json` has no `ai` import. The hub's `src/inference/` directory doesn't exist yet. This is the right time to remove it before it becomes entrenched.
 ### Supply chain risk
 The AI SDK presents moderate supply chain risk:
 1. **Extreme release cadence**: 2-5 releases/day across 3 version lines (1,224 total npm versions). Every release is surface area for compromise or regression.
 2. **April 2026 Vercel security incident**: A threat actor compromised a Vercel employee's Google Workspace account via a supply chain attack on Context.ai, gaining access to Vercel's internal systems. npm publish tokens were rotated after the breach. While no `ai` packages were confirmed compromised, the attack vector is real.
 3. **Bus factor of 1**: One dominant contributor (Lgrammel, 1,980 commits — 5x the #2 contributor). No CODEOWNERS file, no formal governance model.
 4. **Transitive dependency concerns**: `json-schema@0.4.0` is unmaintained with a single maintainer. `@vercel/oidc` is Vercel-specific infrastructure coupling (though only in `@ai-sdk/gateway`, which we wouldn't use).
 5. **Automated release pipeline**: Changesets auto-merge and auto-publish. A compromised maintainer account or malicious PR could publish a poisoned package.
 For comparison, the `openai` npm package has **zero runtime dependencies**, is auto-generated from OpenAPI spec, and releases ~1/week.
 ### Why not "AI SDK with hardening"?
 The supply chain risk assessment ([ai-sdk-supply-chain-risk.md](../research/ai-sdk-supply-chain-risk.md)) recommends "use the AI SDK with supply chain hardening" as its primary option. This ADR goes further and removes the AI SDK entirely. The reasoning:
 1. **Zero runtime integration = zero migration cost**: The hub has no `ai` import in any source file. There is nothing to migrate. Removing a planned dependency that hasn't been integrated yet is essentially free; adding it and removing it later would be expensive.
 2. **Ownership philosophy**: ADR-015 removed opencode because the hub should own its data model and execution model. ADR-016 established hub-own schema ownership. The same principle applies to the streaming protocol and message types — the hub should own these, not have them constrained by a third-party library's release cadence.
 3. **The proxy already abstracts provider routing**: The hub's OpenAI-compatible proxy (already architecturally committed) routes calls to providers. A new provider means adding a route in the proxy, not swapping AI SDK provider packages. The AI SDK's multi-provider abstraction provides no value in this architecture.
 4. **Security is cumulative**: Each supply chain attack surface removed is additive. We removed opencode (ADR-015) and reduced the attack surface. Removing the AI SDK continues this. We're building a platform for other people's production workloads — minimizing trust in external packages with high release cadence and corporate attack targets is a reasonable posture.
 5. **The code is bounded and well-understood**: The AI SDK's streaming protocol is well-specified. Reimplementing a subset that covers the hub's needs is ~900 lines of focused code (see Implementation scope). This is not a risky unknown — it's a straightforward SSE transformation with clear input/output formats.
 ### What we actually need from the AI SDK
 The AI SDK provides three things the hub's architecture references:
 1. **`UIMessage` format** — role + parts array for session messages
 2. **`streamText()`/`generateText()`** — LLM calling with streaming, tool execution, and multi-step agent loops
 3. **`tool()` + `operationToTool()`** — bridging the operations registry to AI SDK tool definitions
 The proxy is already architecturally committed — `agent-sessions.md` describes `/v1/chat/completions` as a Hono HTTP endpoint. The question is whether we call OpenAI-compatible APIs through the AI SDK or directly through the `openai` npm package.
 ### What removing the AI SDK simplifies
 After ADR-015 removed the opencode integration, the AI SDK's role narrowed significantly. The `ai-sdk-provider-opencode-sdk` package is gone. "Runner agents" now run in the dev spoke — they call the hub's OpenAI proxy directly, no AI SDK involved on their side either.
 The only place the AI SDK was used was for "direct agents" running in the hub process. These agents:
 - Read messages from Postgres
 - Convert operations to tools
 - Call an LLM via `streamText()` (which handles multi-step tool execution internally)
 - Persist the response parts back to Postgres
 This is a bounded loop that the hub can implement directly, without the AI SDK's multi-provider abstraction, React hooks, or streaming protocol layers.
 ## Decision
 Remove the Vercel AI SDK as a dependency. The hub will:
 1. **Define its own `UIMessage` type** compatible with the AI SDK's format. ADR-016 already says the hub owns its schema — this extends that ownership to the TypeScript type. The type is a plain interface (role + parts array); there are no runtime dependencies.
 2. **Use the `openai` npm package directly** for LLM calls. Zero runtime dependencies, well-maintained, auto-generated from OpenAPI spec, compatible with Deno via npm specifiers.
 3. **Map operations to OpenAI tool calling format directly** — no `tool()` adapter needed. The operations registry already stores JSON Schema (via TypeBox). Converting `IOperationDefinition.inputSchema` to OpenAI's `{ type: "function", function: { name, description, parameters } }` format is a JSON Schema transform with normalization.
 4. **Implement hub-own streaming** for the proxy's SSE output. The proxy receives OpenAI SSE chunks and transforms them into the hub's stream format — a subset of the AI SDK's `UIMessageChunk` protocol that covers the part types the hub uses.
 5. **Implement the agent execution loop directly**. The AI SDK's `streamText()` handles multi-step tool execution loops internally. The hub will implement this loop explicitly: call LLM → detect tool calls → execute tools via registry → feed results back → repeat until the LLM produces a final response with no tool calls.
 ### Architecture changes
 **Before (AI SDK)**:
 ```
 Direct Agent → streamText() → proxyProvider('anthropic/...') → Hub Proxy → Provider
 Direct Agent → generateText() → proxyProvider('anthropic/...') → Hub Proxy → Provider
 Direct Agent → tool() → operationToTool() → registry.execute()
 Dev Spoke → HTTP POST → Hub Proxy → Provider
 ```
 **After (No AI SDK)**:
 ```
 Direct Agent → AgentLoop → openai SDK → Hub Proxy → Provider
                         ↕
                operationToOpenAITool() → registry.execute()
 Dev Spoke → HTTP POST → Hub Proxy → Provider
 ```
 Both paths go through the same proxy. The proxy adds the provider API key and forwards. The direct agent path uses the `openai` SDK pointed at `localhost` (the proxy). The dev spoke path makes HTTP requests to the proxy.
 ### Agent execution loop
 The AI SDK's `streamText()` handles multi-step tool execution internally: detect tool calls → execute → feed results → re-prompt → repeat. Without it, the hub must implement this loop explicitly.
 **The `AgentLoop`**:
 ```
 ┌─────────────────────────────────────────────────────┐
 │  1. Load session messages from Postgres              │
 │  2. Convert to OpenAI chat message format            │
 │  3. Convert hub operations to OpenAI tool definitions │
 │  4. Call LLM (via openai SDK, streaming)              │
 │  5. Emit stream events to client (SSE)               │
 │  6. Accumulate response                              │
 │  7. If response contains tool_calls:                  │
 │     a. Emit step-finish event                        │
 │     b. For each tool_call:                           │
 │        - Execute via registry.execute()               │
 │        - Emit tool-output-available event             │
 │     c. Append tool results to messages               │
 │     d. Emit step-start event                          │
 │     e. Go to step 4                                  │
 │  8. If response has no tool_calls:                    │
 │     a. Emit finish event (with usage data)            │
 │     b. Persist messages and parts to Postgres         │
 │     c. Done                                          │
 └─────────────────────────────────────────────────────┘
 ```
 **Step boundaries**: Each LLM call within a single agent turn is a "step." Steps are bounded by `step-start` and `step-finish` SSE events so clients can distinguish between the LLM's initial response and subsequent responses after tool execution.
 **Max steps**: Default 10 (configurable per session/role). Prevents infinite tool call loops. If the LLM requests more than 10 steps, the loop terminates with a `finish` event containing `finishReason: "max-steps"`.
 **Error handling**: If a tool execution fails, the loop reports the error to the LLM as a tool result with `errorText` and continues the loop. The LLM can choose to retry, use a different tool, or explain the error to the user. If the LLM call itself fails (rate limit, network error), the hub retries with exponential backoff (max 3 retries for 429/5xx errors). Non-retryable errors (4xx except 429, context window exceeded) are emitted as `error` stream events and the loop terminates.
 **Usage tracking**: The `stream_options: { include_usage: true }` parameter is sent with each LLM call. The final step's usage data (prompt tokens, completion tokens) is accumulated across all steps and included in the `finish` event. The hub's `clients` type `llm-provider` stores cost metadata; the session's `data` column records total usage per turn.
 **Concurrent tool calls**: OpenAI responses can include multiple tool calls in a single response. The hub executes all tool calls in a step concurrently (via `Promise.all`), collects results, then continues the loop. All tool results are appended to messages before the next LLM call.
 ### `UIMessage` type ownership
 ADR-016 already established that the hub owns its schema. We now also own the TypeScript type definition:
 ```ts
 // src/inference/types.ts
 /** Tool call lifecycle states. */
 type ToolCallState =
  | "streaming"   // arguments are being streamed (tool-input-delta events)
  | "call"        // arguments complete, awaiting execution
  | "result"      // tool executed successfully, output available
  | "error";      // tool execution failed, errorText available
 /** Compatible with AI SDK UIMessage but owned by the hub. */
 type UIPart =
  | { type: "text"; text: string; state?: "streaming" | "done" }
  | { type: "reasoning"; text: string; state?: "streaming" | "done" }
  | { type: "tool"; toolCallId: string; toolName: string; state: ToolCallState; input?: unknown; output?: unknown; errorText?: string }
  | { type: "file"; mediaType: string; url: string; filename?: string }
  | { type: "source-url"; sourceId: string; url: string; title?: string }
  | { type: "step-start" }
  | { type: "data"; id?: string; data: unknown; transient?: boolean };
 type UIMessage = {
  id: string;
  role: "system" | "user" | "assistant";
  parts: UIPart[];
  metadata?: {
    model?: string;
    provider?: string;
    tokens?: { prompt: number; completion: number; total: number };
    cost?: number;
    finishReason?: string;
    [key: string]: unknown;
  };
 };
 ```
 This is a **starting subset** of the AI SDK's part types (which includes `source-document`, `dynamic-tool`, `approval-requested`, etc.). We add types as the hub needs them. Import compatibility with opencode sessions remains possible through a mapping layer.
 **Note on `metadata`**: The `metadata` field is typed as a structured object (not `unknown`) because the hub always populates it with model, provider, usage, and finish reason data from the LLM response. The `[key: string]: unknown` index signature allows extensibility without losing type safety for the known fields.
 ### Operation → OpenAI tool mapping
 ```ts
 function operationToOpenAITool(spec: IOperationDefinition): OpenAI.FunctionDefinition {
  const schema = normalizeSchemaForOpenAI(spec.inputSchema);
  return {
    type: "function",
    function: {
      name: `${spec.namespace}.${spec.name}`,
      description: spec.description,
      parameters: schema,
      strict: true,  // enable structured outputs when the operation schema supports it
    },
  };
 }
 /**
 * TypeBox produces JSON Schema, but OpenAI function calling has specific requirements:
 * - Top-level must be object type with properties
 * - additionalProperties: false at top level (required for strict mode)
 * - nested $ref needs resolution (TypeBox typically produces inline schemas)
 * - patternProperties, oneOf/anyOf with complex merging may not translate
 * This function normalizes TypeBox output for OpenAI compatibility.
 */
 function normalizeSchemaForOpenAI(schema: Record<string, unknown>): Record<string, unknown> {
  // ~30-50 lines of normalization:
  // 1. Ensure top-level type: "object"
  // 2. Set additionalProperties: false for strict mode
  // 3. Strip unsupported keywords (patternProperties, etc.)
  // 4. Resolve $ref if present (unusual for TypeBox, but defensive)
  // ...
 }
 ```
 No adapter layer, no `tool()` wrapper, no AI SDK dependency. The operations registry already stores JSON Schema via TypeBox. The normalization step is necessary because OpenAI's function calling API has stricter JSON Schema requirements than TypeBox's default output.
 ### Streaming format for the proxy
 The hub's proxy emits SSE events using a **subset** of the AI SDK's `UIMessageChunk` protocol. We emit only the chunk types we need:
 **Content events**:
 - `text-start`, `text-delta`, `text-end` — text content
 - `reasoning-start`, `reasoning-delta`, `reasoning-end` — reasoning content
 **Tool call lifecycle events**:
 - `tool-input-start` — the LLM is calling a tool (includes `toolCallId`, `toolName`)
 - `tool-input-delta` — streaming tool arguments (JSON fragments)
 - `tool-input-available` — complete tool arguments received (parsed JSON)
 - `tool-output-available` — tool execution result (emitted after registry.execute())
 - `tool-output-error` — tool execution error
 **Step and message boundary events**:
 - `start` — message begins (includes optional `messageId`)
 - `step-start` — new step begins (after tool results are fed back)
 - `step-finish` — step ends (after LLM response, before tool execution)
 - `finish` — message complete (includes `finishReason`, `usage` tokens, `metadata`)
 **Error events**:
 - `error` — stream error (includes `errorText`)
 **Two streaming paths produce the same output format**:
 1. **Proxy path** (dev spoke or external client → Hono HTTP endpoint → provider): The proxy receives OpenAI SSE chunks and transforms them into hub chunk format. This is the SSE handler in the proxy.
 2. **Direct agent path** (hub process → `openai` SDK → proxy → provider): The `AgentLoop` consumes the `openai` SDK's streaming response and emits the same hub chunk format. The internal format is the same; only the input source differs.
 Both paths emit the same SSE format to clients. The direct agent path has the additional responsibility of tool execution and loop management, but the streaming event vocabulary is identical.
 **Tool argument accumulation**: When the proxy path receives `tool-input-delta` events, the client is responsible for accumulating JSON fragments into complete tool arguments. The `openai` SDK handles this accumulation for the direct agent path (its `client.chat.completions.create({ stream: true })` returns accumulated tool call arguments). The `tool-input-available` event contains the complete parsed JSON input.
 **Finish event includes usage data**: The `finish` event includes `usage` with `{ promptTokens, completionTokens, totalTokens }` and `finishReason` (`"stop"`, `"tool-calls"`, `"length"`, `"max-steps"`, `"error"`).
 ### Dependencies removed
 | Package | Version | Notes |
 |---------|---------|-------|
 | `ai` | (was planned) | Core AI SDK — streaming, tool calling, UIMessage |
 | `@ai-sdk/openai-compatible` | (was planned) | Provider for OpenAI-compatible APIs |
 | `@ai-sdk/provider` | (transitive) | Provider interface |
 | `@ai-sdk/provider-utils` | (transitive) | Provider utilities |
 | `zod` | (peer dep) | No longer needed as AI SDK peer dep — we use TypeBox |
 ### Dependencies added
 | Package | Version | Purpose |
 |---------|---------|---------|
 | `openai` | Pinned in deno.json | Direct OpenAI API client, zero runtime deps |
 Per project convention (AGENTS.md: "Pin dependency versions in deno.json — update manually when needed"), the `openai` package will be pinned to a specific version.
 ### Documents requiring update
 | Document | Change | Status |
 |----------|--------|--------|
 | `AGENTS.md` | Remove AI SDK from External Dependencies and Constraints. Add `openai` with pinned version. Update `src/inference/` description. | ✅ Done |
 | `docs/architecture/agent-sessions.md` | Remove `streamText`/`generateText`/`proxyProvider`/`operationToTool` references. Replace with `AgentLoop` using `openai` SDK and `operationToOpenAITool` mapping. Update session data shapes. | Pending |
 | `docs/architecture/open-questions.md` | Add OQ-63, OQ-64, OQ-65. Add Theme 11. Add ADR-018 to resolved table. Add inference chain to cross-cutting dependencies. | ✅ Done |
 | `docs/architecture/packages.md` | Replace "Agent sessions (AI SDK)" with "Agent sessions (openai SDK + AgentLoop)" or similar. | Pending |
 ## Consequences
 ### Positive
 1. **Reduced supply chain attack surface**: Zero transitive dependencies from the LLM calling path. The `openai` package has zero runtime dependencies and is auto-generated from OpenAPI spec.
 2. **No AI SDK release cadence coupling**: We update the `openai` package on our schedule, not at 2-5 releases/day.
 3. **Reduced bundle size**: The AI SDK core (`ai`) is ~50 kB minified, `@ai-sdk/provider` adds ~19.5 kB, plus `@ai-sdk/provider-utils` and transitive deps. The `openai` package is ~129.5 kB but with zero transitive deps — total install footprint is significantly smaller than `ai` + its dependency tree. More importantly, the hub's own streaming code (~300 LOC for the SSE transformer + AgentLoop) is a fraction of the AI SDK's ~2700 lines of `streamText()` alone, and we only ship what we use.
 3. **Hub-own streaming protocol**: We define and evolve the SSE chunk types we need without waiting for AI SDK releases. New part types or chunk types can be added immediately.
 4. **Simpler code paths**: No `proxyProvider()` factory, no `operationToTool()` adapter, no `LanguageModelV3` interface implementation. Direct `openai` SDK calls + JSON Schema tool definitions + explicit `AgentLoop`.
 5. **Consistent with existing patterns**: The operations registry already uses TypeBox → JSON Schema. Mapping operations to OpenAI tool format is a JSON Schema transform, not an adapter to a third-party type system.
 6. **Consistent with ADR-015 and ADR-016**: We've removed opencode's influence on the hub's data model. Removing the AI SDK continues this pattern — the hub owns its types, its streaming protocol, and its tool calling format.
 7. **Explicit agent loop**: The `AgentLoop` is hub code that we can debug, extend, and add observability to. Multi-step tool execution, max steps, error recovery, and usage tracking are all visible and modifiable. The AI SDK's `streamText()` hides this loop inside ~2700 lines of framework code.
 ### Negative
 1. **More code to maintain**: The `AgentLoop`, streaming state machine, and tool execution orchestration are additional hub code. However, this code is bounded (~900 lines total), well-understood (LLM → tool call → execute → feed result → repeat), and has clear input/output formats. The AI SDK's equivalent is ~2700 lines of `streamText()` + the provider abstraction + the tool framework.
 2. **No multi-provider abstraction**: The AI SDK lets you swap providers with one line (`anthropic(...)` → `openai(...)`). With the `openai` SDK, we're locked to OpenAI-compatible APIs. But the hub's proxy already abstracts this — all LLM calls go through `/v1/chat/completions`, and the proxy routes to providers. Adding a new provider means adding a route in the proxy, not swapping AI SDK providers. For providers that don't support OpenAI-compatible APIs (e.g., Anthropic native), the proxy translates the format.
 3. **No AI SDK React hooks**: We can't use `useChat` or `useCompletion` on the frontend. The hub doesn't have a React frontend — it has an API server. Frontend concerns are out of scope.
 4. **Tool calling type safety**: The AI SDK's `tool()` function provides Zod-based type safety for tool input/output. We lose that. But our operations registry already provides TypeBox-based type safety — we're mapping TypeBox schemas to OpenAI's `parameters` field, which is JSON Schema (which TypeBox produces natively).
 ### Implementation scope
 | Component | Estimated effort | Notes |
 |-----------|-----------------|-------|
 | `UIMessage` + `UIPart` + `ToolCallState` type definitions | Small (~60 lines) | Plain TypeScript interfaces |
 | `operationToOpenAITool()` + schema normalization | Small-Medium (~80 lines) | JSON Schema normalization for OpenAI strict mode (~30-50 lines) + mapping |
 | OpenAI proxy SSE handler (Hono) | Medium (~250 lines) | Transform OpenAI SSE → hub chunk format, includes step boundary events |
 | `AgentLoop` — multi-step tool execution loop | Medium (~300 lines) | Step management, tool call detection, tool execution via registry, result feeding, max steps, usage accumulation |
 | Direct agent stream consumer | Small (~80 lines) | Consume `openai` SDK streaming response, emit hub chunk events |
 | Part persistence from stream | Medium-Large (~250 lines) | Map stream chunks to `parts` table inserts/updates, buffered write strategy (flush on `*-end` events), state transitions |
 | Proxy key routing | Small (~50 lines) | Resolve `clients` + `client_secrets` for provider keys |
 | Error handling + retry logic | Small-Medium (~80 lines) | Exponential backoff for 429/5xx, non-retryable error mapping |
 **Total: ~1100 lines** of focused, well-bounded code with clear input/output formats.
 The `AgentLoop` is the most significant component. Its contract is simple:
 - **Input**: messages + tool definitions + model config
 - **Output**: SSE stream of hub chunk events + final UIMessage + usage data
 - **Loop**: call → accumulate → detect tools → execute → feed → repeat
 The AI SDK's `streamText()` handles this loop in ~2700 lines (including provider abstraction, middleware hooks, multi-model smoothing, and edge cases we don't need). Our `AgentLoop` handles exactly our use case in ~300 lines.
 ### Open questions affected
 | OQ | Impact |
 |----|--------|
 | OQ-16 | **Simplified**: ADR-016 resolved this — hub owns its schema. This ADR extends that to TypeScript types. The hub defines `UIMessage`, `UIPart`, and `ToolCallState` types. |
 | Agent sessions architecture (`agent-sessions.md`) | **Needs update**: Remove `streamText`/`generateText`/`proxyProvider`/`operationToTool` references. Replace with `AgentLoop` using `openai` SDK and `operationToOpenAITool` mapping. Document the two streaming paths producing the same output format. |
 | `AGENTS.md` Constraints and Dependencies | **Needs update**: Remove AI SDK from dependencies and constraints. Add `openai` package with pinned version. Update `src/inference/` description. |
 ### Open questions created
 | ID | Question | Priority |
 |----|----------|----------|
 | OQ-63 | What is the exact subset of `UIMessageChunk` types the hub proxy emits? (This ADR lists the initial subset, but extensions will happen as features are added.) | medium |
 | OQ-64 | Should the direct agent use the `openai` SDK's streaming API or raw HTTP for more control? The `openai` SDK provides a convenient typed interface, but raw HTTP gives more control over SSE parsing for the proxy path. | low |
 | OQ-65 | What is the buffered write strategy for part persistence? Options: flush on `*-end` events (per-part commits), flush on `step-finish` (per-step commits), or flush on `finish` (per-message commits). Per-step balances latency and write volume. | medium |
 ## References
 - [ADR-015: Dev spoke instead of opencode integration](ADR-015-dev-spoke-not-opencode.md) — removed opencode dependency
 - [ADR-016: Hub-own schema](ADR-016-hub-own-schema.md) — hub owns session/message/part schema
 - [AI SDK supply chain risk assessment](../research/ai-sdk-supply-chain-risk.md) — detailed analysis of AI SDK risks
 - [agent-sessions.md](../architecture/agent-sessions.md) — current session architecture (references AI SDK)
 - [OpenAI Node SDK](https://github.com/openai/openai-node) — zero-dependency, auto-generated from OpenAPI spec
--- a/docs/research/ai-sdk-supply-chain-risk.md
+++ b/docs/research/ai-sdk-supply-chain-risk.md
@@ -0,0 +1,257 @@
 # Research: Vercel AI SDK Supply Chain Risk Assessment
 ## Question
 Should we use the Vercel AI SDK (`ai` npm package + `@ai-sdk/openai-compatible`) as our LLM integration layer, or should we use OpenAI's SDK directly (`openai` npm package)? What are the supply chain risks?
 ## Executive Summary
 The Vercel AI SDK presents **moderate supply chain risk** — not negligible, not critical. The April 2026 Vercel security incident is a real concern but npm packages were confirmed uncompromised. The dependency tree is shallow and well-scoped. The main risk vectors are: (1) extreme release cadence creating surface area, (2) the Vercel corporate attack surface after the April 2026 breach, and (3) the `@workflow/serde` transitive dependency which is unusual. Using the OpenAI SDK directly eliminates most of these risks at the cost of more boilerplate and no multi-provider abstraction.
 ---
 ## 1. Release Frequency and Pattern
 ### Findings
 The `ai` npm package has released **1,224 versions total**. In 2026 alone:
 | Month | Releases |
 |-------|----------|
 | 2026-01 | 81 |
 | 2026-02 | 56 |
 | 2026-03 | 110 |
 | 2026-04 | 109 |
 | 2026-05 | 58 (partial) |
 **That's approximately 2-5 releases per day across stable + canary + beta channels.**
 The latest stable is `ai@6.0.191` (published May 22, 2026). The canary channel sits at `7.0.0-canary.152`. They maintain 3 concurrent version lines: v5, v6, and v7-canary.
 **Release automation**: The release process is fully automated via GitHub Actions using Changesets (`pnpm changeset` → PR → auto-merge → auto-publish). The release workflow (`.github/workflows/release.yml`) triggers on pushes to `main` that include `.changeset/` changes. The `github-actions[bot]` account is the #2 contributor (773 commits). npm provenance is enabled (`publishConfig.provenance: true`).
 ### Risk Assessment
 - **High release cadence = high surface area**: 2-5 releases/day means constant churn on the supply chain. Pinning is essential.
 - **Automated releases via bot**: The release process is CI/CD automated with Changesets, which is good for consistency but means any compromise of the CI pipeline could push malicious packages.
 - **Positive**: npm provenance is enabled, meaning npm publishes are linked to GitHub Actions runs and specific commits. This provides verifiable attestation.
 ---
 ## 2. Known Supply Chain Incidents
 ### CVE-2025-48985 (Low Severity, 3.7 CVSS)
 A filetype allowlist bypass vulnerability in the AI SDK's file upload functionality. Fixed in versions 5.0.52, 5.1.0-beta.9, and 6.0.0-beta. This is relevant if using the AI SDK's file upload features; not relevant for our inference proxy use case.
 ### Vercel April 2026 Security Incident (HIGH significance)
 **This is the most significant finding.**
 - **What happened**: A Vercel employee's Google Workspace account was compromised via a supply chain attack on Context.ai (a third-party AI tool). The attacker used a compromised OAuth token to pivot into Vercel's internal systems.
 - **Impact**: Non-sensitive environment variables were compromised. A threat actor claimed to have obtained a Vercel database access key and partial source code, selling data on BreachForums for $2M.
 - **npm packages confirmed safe**: Vercel, in collaboration with GitHub, Microsoft, npm, and Socket, confirmed that no npm packages published by Vercel were compromised.
 - **But**: The attacker had access to Vercel's internal systems, including potentially npm publish tokens. Vercel rotated all credentials.
 - **Risk remains**: The breach revealed that Vercel's internal systems are a target. If the attacker had accessed npm publish tokens before rotation, packages could have been poisoned.
 ### September 2025 npm Supply Chain Attack
 Vercel published a response to the wider npm ecosystem attack that compromised `chalk`, `debug`, and 16 other packages. Vercel was not the origin — this was an ecosystem-wide incident. Vercel purged build caches for 76 affected projects.
 ### No AI SDK-specific package poisoning
 There is no evidence that any `ai` or `@ai-sdk/*` package has ever been directly compromised or published with malicious code.
 ---
 ## 3. Dependency Tree
 ### `ai@6.0.191` (core package)
 ```
 ai@6.0.191
 ├── @ai-sdk/gateway@3.0.120
 │   ├── @ai-sdk/provider@3.0.10
 │   │   └── json-schema@0.4.0           (leaf, zero deps)
 │   ├── @ai-sdk/provider-utils@4.0.27
 │   │   ├── @ai-sdk/provider@3.0.10      (duplicate, same)
 │   │   ├── @standard-schema/spec@1.1.0  (leaf, zero deps)
 │   │   └── eventsource-parser@3.0.8     (leaf, zero deps)
 │   └── @vercel/oidc@3.4.1              (leaf, zero deps after checking)
 ├── @ai-sdk/provider@3.0.10             (duplicate)
 ├── @ai-sdk/provider-utils@4.0.27      (duplicate)
 └── @opentelemetry/api@1.9.1            (leaf, zero deps)
 Peer dependency:
 └── zod@^3.25.76 || ^4.1.8
 ```
 ### `@ai-sdk/openai-compatible@2.0.48` (stable)
 ```
@ai-sdk/openai-compatible@2.0.48
 ├── @ai-sdk/provider@3.0.10
 └── @ai-sdk/provider-utils@4.0.27
 Peer dependency:
 └── zod@^3.25.76 || ^4.1.8
 ```
 ### Notable observations
 | Dependency | Assessment |
 |-----------|------------|
 | `json-schema@0.4.0` | **Old** (last meaningful update was years ago). Single-maintainer risk. Used only for JSON Schema validation in `@ai-sdk/provider`. |
 | `@workflow/serde@4.1.0` | **Unusual** — this appeared in the GitHub source (`provider-utils/package.json`) but is NOT in the published npm version. Likely removed during build/publish. This is a Vercel-internal workflow library. |
 | `eventsource-parser@3.0.8` | Single-purpose, well-maintained SSE parser. Zero deps. Low risk. |
 | `@standard-schema/spec@1.1.0` | New standard schema specification. Zero deps. Low risk. |
 | `@vercel/oidc@3.4.1` | Vercel-specific OIDC library. Only pulled in if using `@ai-sdk/gateway`. Low risk for our use case. |
 | `@opentelemetry/api@1.9.1` | Standard OpenTelemetry interface. Zero deps. Well-governed CNCF project. Low risk. |
 | `zod` (peer dep) | Standard validation library. Already in our stack. |
 **Dependency depth**: 3 levels maximum. Most paths are 2 levels deep. This is **good** — shallow tree means fewer transitive attack surfaces.
 **Concerning dependencies**: `json-schema@0.4.0` is the one to watch. It's old, unmaintained, and a single-maintainer package. However, it's only used for JSON Schema validation type definitions in `@ai-sdk/provider`, not for runtime data processing, so the blast radius is limited.
 ---
 ## 4. Maintainer and Governance Model
 ### Core Team
 | Contributor | Commits | Role |
 |------------|---------|------|
 | lgrammel | 1,980 | Lead maintainer (Vercel employee) |
 | github-actions[bot] | 773 | Automated CI/CD |
 | gr2m | 352 | Contributor (also at Vercel) |
 | nicoalbanese | 352 | Contributor (Vercel employee) |
 | shaper | 285 | Contributor |
 | dancer | 274 | Contributor |
 **Bus factor: 1.** Lars Grammel (lgrammel) is the overwhelmingly dominant contributor. The project is a Vercel corporate project, not a community foundation project. Vercel has financial incentive to maintain it, but the knowledge concentration is extreme.
 ### Governance
 - **License**: Apache-2.0 (permissive, good)
 - **Published via**: `vercel-release-bot` npm account
 - **No CODEOWNERS file** found in the repository
 - **No formal governance model** — it's a corporate open-source project with Vercel making all decisions
 - **No security policy file** (SECURITY.md) found in the repo root
 ---
 ## 5. Build and Publish Process
 | Aspect | Detail | Risk |
 |--------|--------|------|
 | **Source verifiable** | Yes — all code is on `github.com/vercel/ai`, publishes from CI | Low |
 | **npm provenance** | Enabled (`publishConfig.provenance: true`) | Low |
 | **Reproducible builds** | No — builds include `pnpm clean && tsup` step, not hermetic | Medium |
 | **Build toolchain** | `tsup` (esbuild-based bundler), `pnpm`, `vitest`, `turbo` | Low |
 | **CI environment** | GitHub Actions with `id-token: write` for OIDC provenance | Low |
 | **Release trigger** | Automated on push to `main` with changeset files | Medium (auto-merge risk) |
 | **Verified commits** | GitHub verified signatures on releases | Low |
 | **Lockfile integrity** | `pnpm-lock.yaml` committed to repo | Low |
 **Positive**: npm provenance is a significant supply chain security feature. It creates a verifiable link between the npm package and the specific GitHub Actions run and commit that produced it.
 **Concerning**: Auto-merge release PRs (`.github/workflows/auto-merge-release-prs.yml`) means that any change that gets a changeset merged to `main` will be automatically published. A compromised maintainer account or a malicious PR could result in a poisoned package.
 ---
 ## 6. Alternatives: OpenAI SDK Direct
 ### `openai` npm package
 | Aspect | AI SDK (`ai` + `@ai-sdk/openai-compatible`) | OpenAI SDK (`openai`) |
 |-------|-------|--------|
 | **Version** | 6.0.191 | 6.39.0 |
 | **Release cadence** | 2-5/day (all channels) | ~1/week |
 | **Runtime dependencies** | 6 direct (3 workspace, 3 external) + transitive | **0** (zero runtime deps) |
 | **Dependency depth** | 3 levels | 0 levels |
 | **Peer dependencies** | zod | None |
 | **npm provenance** | Yes | Yes |
 | **License** | Apache-2.0 | Apache-2.0 |
 | **Maintainer** | Vercel (corporate, 1 dominant dev) | OpenAI (corporate, auto-generated from OpenAPI spec) |
 | **Node.js requirement** | >=22 | >=20 |
 | **Bundle size** | ~19.5 kB (provider), ~50 kB (core) | ~129.5 kB |
 | **Build system** | tsup (custom bundler) | Stainless (auto-generated SDK) |
 | **Source code** | github.com/vercel/ai (open) | github.com/openai/openai-node (open) |
 ### Dependency Tree Comparison
 ```
 ai@6.0.191 (Vercel AI SDK)
 └── 6 direct deps
    └── ~8 transitive deps
       └── json-schema@0.4.0 (unmaintained)
       └── eventsource-parser@3.0.8 (single-purpose)
       └── @standard-schema/spec@1.1.0 (spec only)
       └── @opentelemetry/api@1.9.1 (CNCF)
       └── @vercel/oidc@3.4.1 (Vercel-specific)
 openai@6.39.0 (OpenAI SDK)
 └── 0 runtime dependencies
 ```
 ### Trade-offs
 | Factor | AI SDK | OpenAI SDK |
 |--------|--------|------------|
 | **Multi-provider abstraction** | ✓ Switch providers with 1 line | ✗ Locked to OpenAI |
 | **Streaming helpers** | ✓ Built-in `streamText`, React hooks | ✗ Manual SSE handling |
 | **Structured output / tool calling** | ✓ Type-safe with Zod schemas | ✗ Manual JSON Schema construction |
 | **Supply chain surface** | Medium (6+ deps, Vercel corporate risk) | **Minimal** (zero deps) |
 | **Type safety** | End-to-end (Zod integration) | API boundary only |
 | **Edge runtime** | Required for streaming | Both Node.js and Edge |
 | **Agent patterns** | Built-in (`ToolLoopAgent`, `generateText`) | Not included (use OpenAI Agents SDK separately) |
 | **Future multi-model** | Easy provider swap | Requires complete rewrite |
 | **API update speed** | Community-maintained adapters | Auto-generated from OpenAI spec |
 ### What It Would Take to Switch
 For our use case (inference proxy for an OpenAI-compatible API):
 1. **Replace `@ai-sdk/openai-compatible`** → Create a thin adapter that implements the same `LanguageModelV4Spec` interface but wraps `openai` SDK calls directly
 2. **Replace `streamText`/`generateText`** → Use `openai` SDK's streaming API directly with our own stream framing
 3. **Replace tool calling** → Use OpenAI's tool calling API directly (JSON Schema definitions, manual response parsing)
 4. **Replace Zod integration** → Use our existing `@alkdev/typebox` schemas, convert to JSON Schema for OpenAI API calls
 5. **Estimated effort**: 2-3 days for a minimal proxy, 1-2 weeks for full feature parity including streaming responses
 ---
 ## Risk Summary
 | Risk | Likelihood | Impact | Mitigation |
 |------|-----------|--------|------------|
 | Vercel npm token compromise (post-April 2026) | Low-Medium | Critical | Pin exact versions, verify npm provenance, use lockfile |
 | `json-schema@0.4.0` supply chain | Low | Low | Only type definitions, not runtime execution |
 | Extreme release cadence causing regression | Medium | Medium | Pin versions, test before upgrade |
 | Bus factor (lgrammel dominance) | Medium | Medium | Pin versions, fork if needed (Apache-2.0) |
 | Auto-merge release pipeline compromise | Low | Critical | Verify provenance, audit CI pipeline |
 | `@workflow/serde` / `@vercel/oidc` | Low | Low | Not in our dependency path (only gateway) |
 | Breaking changes across parallel version lines | Medium | Medium | Pin to v6 stable, lockfile |
 ## Recommendation
 **Use the AI SDK, but with supply chain hardening:**
 1. **Pin exact versions** in `deno.json` — never use `^` ranges. Example: `"ai": "6.0.191"` not `"ai": "^6.0.191"`.
 2. **Verify npm provenance** — check that published packages match their GitHub source commits.
 3. **Do not use `@ai-sdk/gateway`** — it brings in `@vercel/oidc` which is unnecessary for our use case and adds Vercel-specific infrastructure coupling.
 4. **Use `@ai-sdk/openai-compatible`** specifically, not `@ai-sdk/openai` — the compatible provider is more generic and avoids OpenAI-specific code paths.
 5. **Set up automated dependency auditing** — run `npm audit` or Socket.dev scanning in CI.
 6. **Monitor the Vercel security bulletins** — subscribe to https://vercel.com/kb/bulletin.
 7. **Have a migration plan** — if supply chain concerns escalate, be ready to switch to the OpenAI SDK directly. The primary value we get from AI SDK is streaming abstractions and tool calling types, which can be reimplemented.
 **If risk tolerance is lower**: Use the `openai` SDK directly. Zero dependencies, simpler supply chain, auto-generated from OpenAPI spec. The trade-off is more boilerplate for streaming and no multi-provider abstraction — but since we're already using `@alkdev/typebox` and have our own operation patterns, the AI SDK's value add is primarily in stream framing, which is ~200 lines of code to replicate.
 ## References
 - Vercel AI SDK GitHub: https://github.com/vercel/ai
 - Vercel April 2026 Security Bulletin: https://vercel.com/kb/bulletin/vercel-april-2026-security-incident
 - OX Security analysis of Vercel breach: https://www.ox.security/blog/vercel-context-ai-supply-chain-attack-breachforums/
 - CVE-2025-48985 (AI SDK file upload bypass): https://advisories.gitlab.com/pkg/npm/ai/
 - Vercel Sept 2025 npm supply chain response: https://vercel.com/blog/critical-npm-supply-chain-attack-response-september-8-2025
 - AI SDK release process (DeepWiki): https://deepwiki.com/vercel/ai/6.3-release-process-and-version-management
 - OpenAI Node SDK: https://github.com/openai/openai-node