alkdev/hub

Files

glm-5.1 85ff26ae51 ADR-018 amendment: meta-tools replace per-operation tool conversion

Instead of flattening the entire operation registry into individual
OpenAI tool definitions (with schema normalization headaches), the
AgentLoop provides the same 4 discovery+call meta-tools that the MCP
server already exposes: hub.list, hub.search, hub.schema, hub.call.

This eliminates operationToOpenAITool() + normalizeSchemaForOpenAI()
entirely. The LLM discovers and calls operations the same way an MCP
client does. Same tool defs in context always (4, not N), same call
protocol path (callMap.call()), unified interface for both MCP and
direct agent consumers.

2026-05-27 05:22:49 +00:00

28 KiB

Raw Blame History

ADR-018: No AI SDK — direct OpenAI proxy with hub-own streaming

Status: Accepted
Date: 2026-05-26
Amended: 2026-05-27 — meta-tools replace per-operation tool conversion
Deciders: alkdev

Context

The hub was architected with the Vercel AI SDK (ai package + @ai-sdk/*) as a core dependency for LLM streaming. agent-sessions.md describes direct agents using streamText()/generateText() with proxyProvider() and operationToTool() bridging the operations registry to AI SDK tools. ADR-016 made AI SDK UIMessage the primary design constraint for the session/message/part schema.

However, the AI SDK has zero runtime integration today — it appears only in architecture docs and deno.json has no ai import. The hub's src/inference/ directory doesn't exist yet. This is the right time to remove it before it becomes entrenched.

Supply chain risk

The AI SDK presents moderate supply chain risk:

Extreme release cadence: 2-5 releases/day across 3 version lines (1,224 total npm versions). Every release is surface area for compromise or regression.
April 2026 Vercel security incident: A threat actor compromised a Vercel employee's Google Workspace account via a supply chain attack on Context.ai, gaining access to Vercel's internal systems. npm publish tokens were rotated after the breach. While no ai packages were confirmed compromised, the attack vector is real.
Bus factor of 1: One dominant contributor (Lgrammel, 1,980 commits — 5x the #2 contributor). No CODEOWNERS file, no formal governance model.
Transitive dependency concerns: json-schema@0.4.0 is unmaintained with a single maintainer. @vercel/oidc is Vercel-specific infrastructure coupling (though only in @ai-sdk/gateway, which we wouldn't use).
Automated release pipeline: Changesets auto-merge and auto-publish. A compromised maintainer account or malicious PR could publish a poisoned package.

For comparison, the openai npm package has zero runtime dependencies, is auto-generated from OpenAPI spec, and releases ~1/week.

Why not "AI SDK with hardening"?

The supply chain risk assessment (ai-sdk-supply-chain-risk.md) recommends "use the AI SDK with supply chain hardening" as its primary option. This ADR goes further and removes the AI SDK entirely. The reasoning:

Zero runtime integration = zero migration cost: The hub has no ai import in any source file. There is nothing to migrate. Removing a planned dependency that hasn't been integrated yet is essentially free; adding it and removing it later would be expensive.
Ownership philosophy: ADR-015 removed opencode because the hub should own its data model and execution model. ADR-016 established hub-own schema ownership. The same principle applies to the streaming protocol and message types — the hub should own these, not have them constrained by a third-party library's release cadence.
The proxy already abstracts provider routing: The hub's OpenAI-compatible proxy (already architecturally committed) routes calls to providers. A new provider means adding a route in the proxy, not swapping AI SDK provider packages. The AI SDK's multi-provider abstraction provides no value in this architecture.
Security is cumulative: Each supply chain attack surface removed is additive. We removed opencode (ADR-015) and reduced the attack surface. Removing the AI SDK continues this. We're building a platform for other people's production workloads — minimizing trust in external packages with high release cadence and corporate attack targets is a reasonable posture.
The code is bounded and well-understood: The AI SDK's streaming protocol is well-specified. Reimplementing a subset that covers the hub's needs is ~900 lines of focused code (see Implementation scope). This is not a risky unknown — it's a straightforward SSE transformation with clear input/output formats.

What we actually need from the AI SDK

The AI SDK provides two things the hub's architecture references:

UIMessage format — role + parts array for session messages
streamText()/generateText() — LLM calling with streaming, tool execution, and multi-step agent loops

The tool bridging problem (tool() + operationToTool()) is already solved by the MCP meta-tools model (hub.list/hub.search/hub.schema/hub.call) — the same 4 discovery+call tools that MCP clients use. These become the OpenAI tool definitions for direct agents. No per-operation schema conversion needed.

The proxy is already architecturally committed — agent-sessions.md describes /v1/chat/completions as a Hono HTTP endpoint. The question is whether we call OpenAI-compatible APIs through the AI SDK or directly through the openai npm package.

What removing the AI SDK simplifies

After ADR-015 removed the opencode integration, the AI SDK's role narrowed significantly. The ai-sdk-provider-opencode-sdk package is gone. "Runner agents" now run in the dev spoke — they call the hub's OpenAI proxy directly, no AI SDK involved on their side either.

The only place the AI SDK was used was for "direct agents" running in the hub process. These agents:

Read messages from Postgres
Provide 4 meta-tools (hub.list/search/schema/call) as OpenAI tool definitions
Call an LLM via streamText() (which handles multi-step tool execution internally)
Tool calls route through the call protocol (same path as MCP clients), not per-operation adapters
Persist the response parts back to Postgres

This is a bounded loop that the hub can implement directly, without the AI SDK's multi-provider abstraction, React hooks, or streaming protocol layers.

Decision

Remove the Vercel AI SDK as a dependency. The hub will:

Define its own UIMessage type compatible with the AI SDK's format. ADR-016 already says the hub owns its schema — this extends that ownership to the TypeScript type. The type is a plain interface (role + parts array); there are no runtime dependencies.
Use the openai npm package directly for LLM calls. Zero runtime dependencies, well-maintained, auto-generated from OpenAPI spec, compatible with Deno via npm specifiers.
Use MCP meta-tools as OpenAI tool definitions — instead of flattening the entire operation registry into individual OpenAI tool definitions (one per operation, with schema normalization headaches), the LLMs get the same 4 discovery+call tools that the MCP server already exposes: hub.list, hub.search, hub.schema, hub.call. The LLM discovers and calls operations the same way an MCP client does. This eliminates operationToOpenAITool() + normalizeSchemaForOpenAI() entirely. See mcp-server.md for the tool definitions and the rationale for this pattern (N operations ≠ N tool defs in context).
Implement hub-own streaming for the proxy's SSE output. The proxy receives OpenAI SSE chunks and transforms them into the hub's stream format — a subset of the AI SDK's UIMessageChunk protocol that covers the part types the hub uses.
Implement the agent execution loop directly. The AI SDK's streamText() handles multi-step tool execution loops internally. The hub will implement this loop explicitly: call LLM → detect tool calls → execute tools via registry → feed results back → repeat until the LLM produces a final response with no tool calls.

Architecture changes

Before (AI SDK):

Direct Agent → streamText() → proxyProvider('anthropic/...') → Hub Proxy → Provider
Direct Agent → generateText() → proxyProvider('anthropic/...') → Hub Proxy → Provider
Direct Agent → tool() → operationToTool() → registry.execute()
Dev Spoke → HTTP POST → Hub Proxy → Provider

After (No AI SDK):

Direct Agent → AgentLoop → openai SDK → Hub Proxy → Provider
                         ↕
                hub.call/list/search/schema (4 meta-tools, same as MCP)
                ↕
                callMap.call() (call protocol) → registry.execute()
Dev Spoke → HTTP POST → Hub Proxy → Provider

Both paths go through the same proxy. The proxy adds the provider API key and forwards. The direct agent path uses the openai SDK pointed at localhost (the proxy). The dev spoke path makes HTTP requests to the proxy.

Agent execution loop

The AI SDK's streamText() handles multi-step tool execution internally: detect tool calls → execute → feed results → re-prompt → repeat. Without it, the hub must implement this loop explicitly.

The AgentLoop:

┌─────────────────────────────────────────────────────┐
│  1. Load session messages from Postgres              │
│  2. Convert to OpenAI chat message format            │
│  3. Provide 4 meta-tools as OpenAI tool definitions  │
│     (hub.list, hub.search, hub.schema, hub.call)     │
│  4. Call LLM (via openai SDK, streaming)              │
│  5. Emit stream events to client (SSE)               │
│  6. Accumulate response                              │
│  7. If response contains tool_calls:                  │
│     a. Emit step-finish event                        │
│     b. For each tool_call:                           │
│        - Route through callMap.call() (call protocol) │
│        - This gives call graph tracking, abort, etc.  │
│        - Emit tool-output-available event             │
│     c. Append tool results to messages               │
│     d. Emit step-start event                          │
│     e. Go to step 4                                  │
│  8. If response has no tool_calls:                    │
│     a. Emit finish event (with usage data)            │
│     b. Persist messages and parts to Postgres         │
│     c. Done                                          │
└─────────────────────────────────────────────────────┘

Step boundaries: Each LLM call within a single agent turn is a "step." Steps are bounded by step-start and step-finish SSE events so clients can distinguish between the LLM's initial response and subsequent responses after tool execution.

Max steps: Default 10 (configurable per session/role). Prevents infinite tool call loops. If the LLM requests more than 10 steps, the loop terminates with a finish event containing finishReason: "max-steps".

Error handling: If a tool execution fails, the loop reports the error to the LLM as a tool result with errorText and continues the loop. The LLM can choose to retry, use a different tool, or explain the error to the user. If the LLM call itself fails (rate limit, network error), the hub retries with exponential backoff (max 3 retries for 429/5xx errors). Non-retryable errors (4xx except 429, context window exceeded) are emitted as error stream events and the loop terminates.

Usage tracking: The stream_options: { include_usage: true } parameter is sent with each LLM call. The final step's usage data (prompt tokens, completion tokens) is accumulated across all steps and included in the finish event. The hub's clients type llm-provider stores cost metadata; the session's data column records total usage per turn.

Concurrent tool calls: OpenAI responses can include multiple tool calls in a single response. The hub executes all tool calls in a step concurrently (via Promise.all over callMap.call() invocations), collects results, then continues the loop. All tool results are appended to messages before the next LLM call. The LLM can also batch independent calls in a single hub.call invocation (since hub.call accepts an array), which is more token-efficient.

`UIMessage` type ownership

ADR-016 already established that the hub owns its schema. We now also own the TypeScript type definition:

// src/inference/types.ts

/** Tool call lifecycle states. */
type ToolCallState =
  | "streaming"   // arguments are being streamed (tool-input-delta events)
  | "call"        // arguments complete, awaiting execution
  | "result"      // tool executed successfully, output available
  | "error";      // tool execution failed, errorText available

/** Compatible with AI SDK UIMessage but owned by the hub. */
type UIPart =
  | { type: "text"; text: string; state?: "streaming" | "done" }
  | { type: "reasoning"; text: string; state?: "streaming" | "done" }
  | { type: "tool"; toolCallId: string; toolName: string; state: ToolCallState; input?: unknown; output?: unknown; errorText?: string }
  | { type: "file"; mediaType: string; url: string; filename?: string }
  | { type: "source-url"; sourceId: string; url: string; title?: string }
  | { type: "step-start" }
  | { type: "data"; id?: string; data: unknown; transient?: boolean };

type UIMessage = {
  id: string;
  role: "system" | "user" | "assistant";
  parts: UIPart[];
  metadata?: {
    model?: string;
    provider?: string;
    tokens?: { prompt: number; completion: number; total: number };
    cost?: number;
    finishReason?: string;
    [key: string]: unknown;
  };
};

This is a starting subset of the AI SDK's part types (which includes source-document, dynamic-tool, approval-requested, etc.). We add types as the hub needs them. Import compatibility with opencode sessions remains possible through a mapping layer.

Note on metadata: The metadata field is typed as a structured object (not unknown) because the hub always populates it with model, provider, usage, and finish reason data from the LLM response. The [key: string]: unknown index signature allows extensibility without losing type safety for the known fields.

Meta-tools: same interface as MCP server

Instead of flattening the entire operation registry into individual OpenAI tool definitions (one per operation, with all the schema normalization that requires), the AgentLoop provides the LLM with the same 4 discovery+call meta-tools that the MCP server already exposes (see mcp-server.md):

Tool	Input	Output	Description
`hub.list`	`{ namespace?: string }`	`OperationSpec[]`	List available operations, optionally filtered by namespace
`hub.search`	`{ q?: string, namespace?: string }`	`{ tool, description }[]`	Search operations by query string and/or namespace
`hub.schema`	`{ tool: string }`	`{ inputSchema, outputSchema }`	Get TypeBox schemas for a specific operation
`hub.call`	`{ calls: [{ tool, input? }] }`	`{ success, result/error }[]`	Execute operations via call protocol (supports batch)

Why this matters for OpenAI integration: Each of these 4 tools has a small, stable schema. The LLM's context always contains just 4 tool definitions, not N. The LLM discovers what it needs (search, schema), then calls it. This is the same pattern that works for MCP clients, and it works identically for OpenAI tool-calling agents. No schema normalization is needed — the 4 schemas are hand-defined and fully under the hub's control.

hub.call routes through callMap.call() (the call protocol), not registry.execute() directly. This gives full call graph tracking, abort cascading, and structured error handling — the same for both MCP clients and direct agents.

Batch calls: hub.call accepts an array of { tool, input } pairs and returns an array of results. This replaces the previous "batch by default" concept with a single, explicit tool. The LLM can batch multiple independent calls in a single tool invocation, which is more token-efficient than making N separate calls.

Agent workflow (same as MCP workflow from mcp-server.md):

Agent: "I need to spawn a worktree for the auth feature"
  → hub.search({ q: "spawn" })          → [{ tool: "coord.spawn", description: "..." }]
  → hub.schema({ tool: "coord.spawn" }) → { inputSchema: { sessionId, task, branch, ... }, ... }
  → hub.call({ calls: [{ tool: "coord.spawn", input: { ... } }] })

Agent: "Let me also check status and send a message"
  → hub.call({ calls: [
      { tool: "coord.status", input: { parentSessionId: "..." } },
      { tool: "coord.message", input: { sessionId: "...", body: "..." } }
    ] })

No operationToOpenAITool() or normalizeSchemaForOpenAI() needed: The 4 meta-tool schemas are hand-defined, small, and don't change when operations are added or modified. The previous approach of converting each IOperationDefinition.inputSchema (TypeBox → JSON Schema) to OpenAI's function calling format required normalization (OpenAI requires additionalProperties: false, top-level type: "object", no $ref, no patternProperties). That entire problem disappears.

Streaming format for the proxy

The hub's proxy emits SSE events using a subset of the AI SDK's UIMessageChunk protocol. We emit only the chunk types we need:

Content events:

text-start, text-delta, text-end — text content
reasoning-start, reasoning-delta, reasoning-end — reasoning content

Tool call lifecycle events:

tool-input-start — the LLM is calling a tool (includes toolCallId, toolName)
tool-input-delta — streaming tool arguments (JSON fragments)
tool-input-available — complete tool arguments received (parsed JSON)
tool-output-available — tool execution result (emitted after registry.execute())
tool-output-error — tool execution error

Step and message boundary events:

start — message begins (includes optional messageId)
step-start — new step begins (after tool results are fed back)
step-finish — step ends (after LLM response, before tool execution)
finish — message complete (includes finishReason, usage tokens, metadata)

Error events:

error — stream error (includes errorText)

Two streaming paths produce the same output format:

Proxy path (dev spoke or external client → Hono HTTP endpoint → provider): The proxy receives OpenAI SSE chunks and transforms them into hub chunk format. This is the SSE handler in the proxy.
Direct agent path (hub process → openai SDK → proxy → provider): The AgentLoop consumes the openai SDK's streaming response and emits the same hub chunk format. The internal format is the same; only the input source differs.

Both paths emit the same SSE format to clients. The direct agent path has the additional responsibility of tool execution and loop management, but the streaming event vocabulary is identical.

Tool argument accumulation: When the proxy path receives tool-input-delta events, the client is responsible for accumulating JSON fragments into complete tool arguments. The openai SDK handles this accumulation for the direct agent path (its client.chat.completions.create({ stream: true }) returns accumulated tool call arguments). The tool-input-available event contains the complete parsed JSON input.

Finish event includes usage data: The finish event includes usage with { promptTokens, completionTokens, totalTokens } and finishReason ("stop", "tool-calls", "length", "max-steps", "error").

Dependencies removed

Package	Version	Notes
`ai`	(was planned)	Core AI SDK — streaming, tool calling, UIMessage
`@ai-sdk/openai-compatible`	(was planned)	Provider for OpenAI-compatible APIs
`@ai-sdk/provider`	(transitive)	Provider interface
`@ai-sdk/provider-utils`	(transitive)	Provider utilities
`zod`	(peer dep)	No longer needed as AI SDK peer dep — we use TypeBox

Dependencies added

Package	Version	Purpose
`openai`	Pinned in deno.json	Direct OpenAI API client, zero runtime deps

Per project convention (AGENTS.md: "Pin dependency versions in deno.json — update manually when needed"), the openai package will be pinned to a specific version.

Documents requiring update

Document	Change	Status
`AGENTS.md`	Remove AI SDK from External Dependencies and Constraints. Add `openai` with pinned version. Update `src/inference/` description.	✅ Done
`docs/architecture/agent-sessions.md`	Remove `streamText`/`generateText`/`proxyProvider`/`operationToTool` references. Replace with `AgentLoop` using `openai` SDK and 4 meta-tools (hub.list/search/schema/call). Document unified MCP/direct agent tool interface. Update session data shapes.	Pending
`docs/architecture/open-questions.md`	Add OQ-63, OQ-64, OQ-65. Add Theme 11. Add ADR-018 to resolved table. Add inference chain to cross-cutting dependencies.	✅ Done
`docs/architecture/packages.md`	Replace "Agent sessions (AI SDK)" with "Agent sessions (openai SDK + AgentLoop)" or similar.	Pending

Consequences

Positive

Reduced supply chain attack surface: Zero transitive dependencies from the LLM calling path. The openai package has zero runtime dependencies and is auto-generated from OpenAPI spec.
No AI SDK release cadence coupling: We update the openai package on our schedule, not at 2-5 releases/day.
Reduced bundle size: The AI SDK core (ai) is ~50 kB minified, @ai-sdk/provider adds ~19.5 kB, plus @ai-sdk/provider-utils and transitive deps. The openai package is ~129.5 kB but with zero transitive deps — total install footprint is significantly smaller than ai + its dependency tree. More importantly, the hub's own streaming code (~300 LOC for the SSE transformer + AgentLoop) is a fraction of the AI SDK's ~2700 lines of streamText() alone, and we only ship what we use.
Hub-own streaming protocol: We define and evolve the SSE chunk types we need without waiting for AI SDK releases. New part types or chunk types can be added immediately.
Unified tool interface for MCP and direct agents: The same 4 discovery+call meta-tools (hub.list/hub.search/hub.schema/hub.call) serve both MCP clients and OpenAI tool-calling agents. No per-operation schema conversion, no normalizeSchemaForOpenAI(), no adapter layers. The LLM discovers operations the same way an MCP client does — search, get schema, call.
Consistent with existing patterns: The operations registry and call protocol are unchanged. Direct agents route tool calls through callMap.call() — the same path as MCP clients and spoke calls.
Consistent with ADR-015 and ADR-016: We've removed opencode's influence on the hub's data model. Removing the AI SDK continues this pattern — the hub owns its types, its streaming protocol, and its tool calling format.
Explicit agent loop: The AgentLoop is hub code that we can debug, extend, and add observability to. Multi-step tool execution, max steps, error recovery, and usage tracking are all visible and modifiable. The AI SDK's streamText() hides this loop inside ~2700 lines of framework code.

Negative

More code to maintain: The AgentLoop, streaming state machine, and tool execution orchestration are additional hub code. However, this code is bounded (~900 lines total), well-understood (LLM → tool call → execute → feed result → repeat), and has clear input/output formats. The AI SDK's equivalent is ~2700 lines of streamText() + the provider abstraction + the tool framework.
No multi-provider abstraction: The AI SDK lets you swap providers with one line (anthropic(...) → openai(...)). With the openai SDK, we're locked to OpenAI-compatible APIs. But the hub's proxy already abstracts this — all LLM calls go through /v1/chat/completions, and the proxy routes to providers. Adding a new provider means adding a route in the proxy, not swapping AI SDK providers. For providers that don't support OpenAI-compatible APIs (e.g., Anthropic native), the proxy translates the format.
No AI SDK React hooks: We can't use useChat or useCompletion on the frontend. The hub doesn't have a React frontend — it has an API server. Frontend concerns are out of scope.
No per-operation type safety in tool calls: The AI SDK's tool() function provides Zod-based type safety for each tool's input/output. With meta-tools, hub.call accepts { tool: string, input?: unknown } — the LLM gets the schema via hub.schema and constructs the input. The call protocol validates the input against the operation's TypeBox schema at execution time, so invalid inputs are caught, but the LLM doesn't get compile-time type checking for individual operation calls. This is the same trade-off MCP clients accept — it's inherent to the discovery+call model.

Implementation scope

Component	Estimated effort	Notes
`UIMessage` + `UIPart` + `ToolCallState` type definitions	Small (~60 lines)	Plain TypeScript interfaces
4 meta-tool OpenAI definitions + schema	Small (~40 lines)	`hub.list`/`hub.search`/`hub.schema`/`hub.call` — same schemas as MCP server, wrapped as OpenAI function definitions
OpenAI proxy SSE handler (Hono)	Medium (~250 lines)	Transform OpenAI SSE → hub chunk format, includes step boundary events
`AgentLoop` — multi-step tool execution loop	Medium (~250 lines)	Step management, tool call detection, routing via callMap.call(), result feeding, max steps, usage accumulation. Simpler than before — tool calls all go through the same call protocol path.
Direct agent stream consumer	Small (~80 lines)	Consume `openai` SDK streaming response, emit hub chunk events
Part persistence from stream	Medium-Large (~250 lines)	Map stream chunks to `parts` table inserts/updates, buffered write strategy (flush on `*-end` events), state transitions
Proxy key routing	Small (~50 lines)	Resolve `clients` + `client_secrets` for provider keys
Error handling + retry logic	Small-Medium (~80 lines)	Exponential backoff for 429/5xx, non-retryable error mapping

Total: ~1060 lines of focused, well-bounded code with clear input/output formats.

The AgentLoop is the most significant component. Its contract is simple:

Input: messages + 4 meta-tool definitions + model config
Output: SSE stream of hub chunk events + final UIMessage + usage data
Loop: call → accumulate → detect tools → route through callMap.call() → feed → repeat

The AI SDK's streamText() handles this loop in ~2700 lines (including provider abstraction, middleware hooks, multi-model smoothing, and edge cases we don't need). Our AgentLoop handles exactly our use case in ~300 lines.

Open questions affected

OQ	Impact
OQ-16	Simplified: ADR-016 resolved this — hub owns its schema. This ADR extends that to TypeScript types. The hub defines `UIMessage`, `UIPart`, and `ToolCallState` types.
Agent sessions architecture (`agent-sessions.md`)	Needs update: Remove `streamText`/`generateText`/`proxyProvider`/`operationToTool` references. Replace with `AgentLoop` using `openai` SDK and 4 meta-tools. Document the unified MCP/direct agent tool interface.
`AGENTS.md` Constraints and Dependencies	Needs update: Remove AI SDK from dependencies and constraints. Add `openai` package with pinned version. Update `src/inference/` description.

Open questions created

ID	Question	Priority
OQ-63	What is the exact subset of `UIMessageChunk` types the hub proxy emits? (This ADR lists the initial subset, but extensions will happen as features are added.)	medium
OQ-64	Should the direct agent use the `openai` SDK's streaming API or raw HTTP for more control? The `openai` SDK provides a convenient typed interface, but raw HTTP gives more control over SSE parsing for the proxy path.	low
OQ-65	What is the buffered write strategy for part persistence? Options: flush on `*-end` events (per-part commits), flush on `step-finish` (per-step commits), or flush on `finish` (per-message commits). Per-step balances latency and write volume.	medium

References

ADR-015: Dev spoke instead of opencode integration — removed opencode dependency
ADR-016: Hub-own schema — hub owns session/message/part schema
MCP server: Discovery + Call Interface — the 4 meta-tools model that now serves both MCP and direct agents
AI SDK supply chain risk assessment — detailed analysis of AI SDK risks
agent-sessions.md — current session architecture (references AI SDK)
OpenAI Node SDK — zero-dependency, auto-generated from OpenAPI spec

28 KiB Raw Blame History