ADR-018: Remove AI SDK, use openai SDK directly with hub-own streaming

Replace the Vercel AI SDK with direct OpenAI SDK calls and a custom AgentLoop. The AI SDK has zero runtime integration today, so removing it costs nothing. Supply chain risk (2-5 releases/day, April 2026 Vercel breach, bus factor of 1) makes it a liability we don't need. Key changes: - ADR-018 accepted: openai package (zero runtime deps) replaces ai SDK - AgentLoop handles multi-step tool execution explicitly (~300 LOC vs AI SDK's ~2700 LOC streamText) - Hub owns UIMessage/UIPart/ToolCallState types (extends ADR-016) - Hub owns streaming protocol (subset of AI SDK's UIMessageChunk wire format with step boundaries, error handling, usage tracking) - operationToOpenAITool() maps TypeBox schemas directly, no adapter - Trade-off: ~1100 LOC total new code for the savings of 6+ transitive deps, supply chain risk, and release cadence coupling Updates AGENTS.md constraints and dependencies, adds OQ-63/OQ-64/OQ-65 and Theme 11 (Inference & LLM Integration) to open questions.
2026-05-26 08:55:52 +00:00
parent 2d7f9c11cb
commit a248698f40
4 changed files with 634 additions and 3 deletions
--- a/docs/research/ai-sdk-supply-chain-risk.md
+++ b/docs/research/ai-sdk-supply-chain-risk.md
@@ -0,0 +1,257 @@
+# Research: Vercel AI SDK Supply Chain Risk Assessment
+
+## Question
+
+Should we use the Vercel AI SDK (`ai` npm package + `@ai-sdk/openai-compatible`) as our LLM integration layer, or should we use OpenAI's SDK directly (`openai` npm package)? What are the supply chain risks?
+
+## Executive Summary
+
+The Vercel AI SDK presents **moderate supply chain risk** — not negligible, not critical. The April 2026 Vercel security incident is a real concern but npm packages were confirmed uncompromised. The dependency tree is shallow and well-scoped. The main risk vectors are: (1) extreme release cadence creating surface area, (2) the Vercel corporate attack surface after the April 2026 breach, and (3) the `@workflow/serde` transitive dependency which is unusual. Using the OpenAI SDK directly eliminates most of these risks at the cost of more boilerplate and no multi-provider abstraction.
+
+---
+
+## 1. Release Frequency and Pattern
+
+### Findings
+
+The `ai` npm package has released **1,224 versions total**. In 2026 alone:
+
+| Month | Releases |
+|-------|----------|
+| 2026-01 | 81 |
+| 2026-02 | 56 |
+| 2026-03 | 110 |
+| 2026-04 | 109 |
+| 2026-05 | 58 (partial) |
+
+**That's approximately 2-5 releases per day across stable + canary + beta channels.**
+
+The latest stable is `ai@6.0.191` (published May 22, 2026). The canary channel sits at `7.0.0-canary.152`. They maintain 3 concurrent version lines: v5, v6, and v7-canary.
+
+**Release automation**: The release process is fully automated via GitHub Actions using Changesets (`pnpm changeset` → PR → auto-merge → auto-publish). The release workflow (`.github/workflows/release.yml`) triggers on pushes to `main` that include `.changeset/` changes. The `github-actions[bot]` account is the #2 contributor (773 commits). npm provenance is enabled (`publishConfig.provenance: true`).
+
+### Risk Assessment
+
+- **High release cadence = high surface area**: 2-5 releases/day means constant churn on the supply chain. Pinning is essential.
+- **Automated releases via bot**: The release process is CI/CD automated with Changesets, which is good for consistency but means any compromise of the CI pipeline could push malicious packages.
+- **Positive**: npm provenance is enabled, meaning npm publishes are linked to GitHub Actions runs and specific commits. This provides verifiable attestation.
+
+---
+
+## 2. Known Supply Chain Incidents
+
+### CVE-2025-48985 (Low Severity, 3.7 CVSS)
+
+A filetype allowlist bypass vulnerability in the AI SDK's file upload functionality. Fixed in versions 5.0.52, 5.1.0-beta.9, and 6.0.0-beta. This is relevant if using the AI SDK's file upload features; not relevant for our inference proxy use case.
+
+### Vercel April 2026 Security Incident (HIGH significance)
+
+**This is the most significant finding.**
+
+- **What happened**: A Vercel employee's Google Workspace account was compromised via a supply chain attack on Context.ai (a third-party AI tool). The attacker used a compromised OAuth token to pivot into Vercel's internal systems.
+- **Impact**: Non-sensitive environment variables were compromised. A threat actor claimed to have obtained a Vercel database access key and partial source code, selling data on BreachForums for $2M.
+- **npm packages confirmed safe**: Vercel, in collaboration with GitHub, Microsoft, npm, and Socket, confirmed that no npm packages published by Vercel were compromised.
+- **But**: The attacker had access to Vercel's internal systems, including potentially npm publish tokens. Vercel rotated all credentials.
+- **Risk remains**: The breach revealed that Vercel's internal systems are a target. If the attacker had accessed npm publish tokens before rotation, packages could have been poisoned.
+
+### September 2025 npm Supply Chain Attack
+
+Vercel published a response to the wider npm ecosystem attack that compromised `chalk`, `debug`, and 16 other packages. Vercel was not the origin — this was an ecosystem-wide incident. Vercel purged build caches for 76 affected projects.
+
+### No AI SDK-specific package poisoning
+
+There is no evidence that any `ai` or `@ai-sdk/*` package has ever been directly compromised or published with malicious code.
+
+---
+
+## 3. Dependency Tree
+
+### `ai@6.0.191` (core package)
+
+```
+ai@6.0.191
+├── @ai-sdk/gateway@3.0.120
+│   ├── @ai-sdk/provider@3.0.10
+│   │   └── json-schema@0.4.0           (leaf, zero deps)
+│   ├── @ai-sdk/provider-utils@4.0.27
+│   │   ├── @ai-sdk/provider@3.0.10      (duplicate, same)
+│   │   ├── @standard-schema/spec@1.1.0  (leaf, zero deps)
+│   │   └── eventsource-parser@3.0.8     (leaf, zero deps)
+│   └── @vercel/oidc@3.4.1              (leaf, zero deps after checking)
+├── @ai-sdk/provider@3.0.10             (duplicate)
+├── @ai-sdk/provider-utils@4.0.27      (duplicate)
+└── @opentelemetry/api@1.9.1            (leaf, zero deps)
+
+Peer dependency:
+└── zod@^3.25.76 || ^4.1.8
+```
+
+### `@ai-sdk/openai-compatible@2.0.48` (stable)
+
+```
+@ai-sdk/openai-compatible@2.0.48
+├── @ai-sdk/provider@3.0.10
+└── @ai-sdk/provider-utils@4.0.27
+
+Peer dependency:
+└── zod@^3.25.76 || ^4.1.8
+```
+
+### Notable observations
+
+| Dependency | Assessment |
+|-----------|------------|
+| `json-schema@0.4.0` | **Old** (last meaningful update was years ago). Single-maintainer risk. Used only for JSON Schema validation in `@ai-sdk/provider`. |
+| `@workflow/serde@4.1.0` | **Unusual** — this appeared in the GitHub source (`provider-utils/package.json`) but is NOT in the published npm version. Likely removed during build/publish. This is a Vercel-internal workflow library. |
+| `eventsource-parser@3.0.8` | Single-purpose, well-maintained SSE parser. Zero deps. Low risk. |
+| `@standard-schema/spec@1.1.0` | New standard schema specification. Zero deps. Low risk. |
+| `@vercel/oidc@3.4.1` | Vercel-specific OIDC library. Only pulled in if using `@ai-sdk/gateway`. Low risk for our use case. |
+| `@opentelemetry/api@1.9.1` | Standard OpenTelemetry interface. Zero deps. Well-governed CNCF project. Low risk. |
+| `zod` (peer dep) | Standard validation library. Already in our stack. |
+
+**Dependency depth**: 3 levels maximum. Most paths are 2 levels deep. This is **good** — shallow tree means fewer transitive attack surfaces.
+
+**Concerning dependencies**: `json-schema@0.4.0` is the one to watch. It's old, unmaintained, and a single-maintainer package. However, it's only used for JSON Schema validation type definitions in `@ai-sdk/provider`, not for runtime data processing, so the blast radius is limited.
+
+---
+
+## 4. Maintainer and Governance Model
+
+### Core Team
+
+| Contributor | Commits | Role |
+|------------|---------|------|
+| lgrammel | 1,980 | Lead maintainer (Vercel employee) |
+| github-actions[bot] | 773 | Automated CI/CD |
+| gr2m | 352 | Contributor (also at Vercel) |
+| nicoalbanese | 352 | Contributor (Vercel employee) |
+| shaper | 285 | Contributor |
+| dancer | 274 | Contributor |
+
+**Bus factor: 1.** Lars Grammel (lgrammel) is the overwhelmingly dominant contributor. The project is a Vercel corporate project, not a community foundation project. Vercel has financial incentive to maintain it, but the knowledge concentration is extreme.
+
+### Governance
+
+- **License**: Apache-2.0 (permissive, good)
+- **Published via**: `vercel-release-bot` npm account
+- **No CODEOWNERS file** found in the repository
+- **No formal governance model** — it's a corporate open-source project with Vercel making all decisions
+- **No security policy file** (SECURITY.md) found in the repo root
+
+---
+
+## 5. Build and Publish Process
+
+| Aspect | Detail | Risk |
+|--------|--------|------|
+| **Source verifiable** | Yes — all code is on `github.com/vercel/ai`, publishes from CI | Low |
+| **npm provenance** | Enabled (`publishConfig.provenance: true`) | Low |
+| **Reproducible builds** | No — builds include `pnpm clean && tsup` step, not hermetic | Medium |
+| **Build toolchain** | `tsup` (esbuild-based bundler), `pnpm`, `vitest`, `turbo` | Low |
+| **CI environment** | GitHub Actions with `id-token: write` for OIDC provenance | Low |
+| **Release trigger** | Automated on push to `main` with changeset files | Medium (auto-merge risk) |
+| **Verified commits** | GitHub verified signatures on releases | Low |
+| **Lockfile integrity** | `pnpm-lock.yaml` committed to repo | Low |
+
+**Positive**: npm provenance is a significant supply chain security feature. It creates a verifiable link between the npm package and the specific GitHub Actions run and commit that produced it.
+
+**Concerning**: Auto-merge release PRs (`.github/workflows/auto-merge-release-prs.yml`) means that any change that gets a changeset merged to `main` will be automatically published. A compromised maintainer account or a malicious PR could result in a poisoned package.
+
+---
+
+## 6. Alternatives: OpenAI SDK Direct
+
+### `openai` npm package
+
+| Aspect | AI SDK (`ai` + `@ai-sdk/openai-compatible`) | OpenAI SDK (`openai`) |
+|-------|-------|--------|
+| **Version** | 6.0.191 | 6.39.0 |
+| **Release cadence** | 2-5/day (all channels) | ~1/week |
+| **Runtime dependencies** | 6 direct (3 workspace, 3 external) + transitive | **0** (zero runtime deps) |
+| **Dependency depth** | 3 levels | 0 levels |
+| **Peer dependencies** | zod | None |
+| **npm provenance** | Yes | Yes |
+| **License** | Apache-2.0 | Apache-2.0 |
+| **Maintainer** | Vercel (corporate, 1 dominant dev) | OpenAI (corporate, auto-generated from OpenAPI spec) |
+| **Node.js requirement** | >=22 | >=20 |
+| **Bundle size** | ~19.5 kB (provider), ~50 kB (core) | ~129.5 kB |
+| **Build system** | tsup (custom bundler) | Stainless (auto-generated SDK) |
+| **Source code** | github.com/vercel/ai (open) | github.com/openai/openai-node (open) |
+
+### Dependency Tree Comparison
+
+```
+ai@6.0.191 (Vercel AI SDK)
+└── 6 direct deps
+    └── ~8 transitive deps
+       └── json-schema@0.4.0 (unmaintained)
+       └── eventsource-parser@3.0.8 (single-purpose)
+       └── @standard-schema/spec@1.1.0 (spec only)
+       └── @opentelemetry/api@1.9.1 (CNCF)
+       └── @vercel/oidc@3.4.1 (Vercel-specific)
+
+openai@6.39.0 (OpenAI SDK)
+└── 0 runtime dependencies
+```
+
+### Trade-offs
+
+| Factor | AI SDK | OpenAI SDK |
+|--------|--------|------------|
+| **Multi-provider abstraction** | ✓ Switch providers with 1 line | ✗ Locked to OpenAI |
+| **Streaming helpers** | ✓ Built-in `streamText`, React hooks | ✗ Manual SSE handling |
+| **Structured output / tool calling** | ✓ Type-safe with Zod schemas | ✗ Manual JSON Schema construction |
+| **Supply chain surface** | Medium (6+ deps, Vercel corporate risk) | **Minimal** (zero deps) |
+| **Type safety** | End-to-end (Zod integration) | API boundary only |
+| **Edge runtime** | Required for streaming | Both Node.js and Edge |
+| **Agent patterns** | Built-in (`ToolLoopAgent`, `generateText`) | Not included (use OpenAI Agents SDK separately) |
+| **Future multi-model** | Easy provider swap | Requires complete rewrite |
+| **API update speed** | Community-maintained adapters | Auto-generated from OpenAI spec |
+
+### What It Would Take to Switch
+
+For our use case (inference proxy for an OpenAI-compatible API):
+
+1. **Replace `@ai-sdk/openai-compatible`** → Create a thin adapter that implements the same `LanguageModelV4Spec` interface but wraps `openai` SDK calls directly
+2. **Replace `streamText`/`generateText`** → Use `openai` SDK's streaming API directly with our own stream framing
+3. **Replace tool calling** → Use OpenAI's tool calling API directly (JSON Schema definitions, manual response parsing)
+4. **Replace Zod integration** → Use our existing `@alkdev/typebox` schemas, convert to JSON Schema for OpenAI API calls
+5. **Estimated effort**: 2-3 days for a minimal proxy, 1-2 weeks for full feature parity including streaming responses
+
+---
+
+## Risk Summary
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|-----------|--------|------------|
+| Vercel npm token compromise (post-April 2026) | Low-Medium | Critical | Pin exact versions, verify npm provenance, use lockfile |
+| `json-schema@0.4.0` supply chain | Low | Low | Only type definitions, not runtime execution |
+| Extreme release cadence causing regression | Medium | Medium | Pin versions, test before upgrade |
+| Bus factor (lgrammel dominance) | Medium | Medium | Pin versions, fork if needed (Apache-2.0) |
+| Auto-merge release pipeline compromise | Low | Critical | Verify provenance, audit CI pipeline |
+| `@workflow/serde` / `@vercel/oidc` | Low | Low | Not in our dependency path (only gateway) |
+| Breaking changes across parallel version lines | Medium | Medium | Pin to v6 stable, lockfile |
+
+## Recommendation
+
+**Use the AI SDK, but with supply chain hardening:**
+
+1. **Pin exact versions** in `deno.json` — never use `^` ranges. Example: `"ai": "6.0.191"` not `"ai": "^6.0.191"`.
+2. **Verify npm provenance** — check that published packages match their GitHub source commits.
+3. **Do not use `@ai-sdk/gateway`** — it brings in `@vercel/oidc` which is unnecessary for our use case and adds Vercel-specific infrastructure coupling.
+4. **Use `@ai-sdk/openai-compatible`** specifically, not `@ai-sdk/openai` — the compatible provider is more generic and avoids OpenAI-specific code paths.
+5. **Set up automated dependency auditing** — run `npm audit` or Socket.dev scanning in CI.
+6. **Monitor the Vercel security bulletins** — subscribe to https://vercel.com/kb/bulletin.
+7. **Have a migration plan** — if supply chain concerns escalate, be ready to switch to the OpenAI SDK directly. The primary value we get from AI SDK is streaming abstractions and tool calling types, which can be reimplemented.
+
+**If risk tolerance is lower**: Use the `openai` SDK directly. Zero dependencies, simpler supply chain, auto-generated from OpenAPI spec. The trade-off is more boilerplate for streaming and no multi-provider abstraction — but since we're already using `@alkdev/typebox` and have our own operation patterns, the AI SDK's value add is primarily in stream framing, which is ~200 lines of code to replicate.
+
+## References
+
+- Vercel AI SDK GitHub: https://github.com/vercel/ai
+- Vercel April 2026 Security Bulletin: https://vercel.com/kb/bulletin/vercel-april-2026-security-incident
+- OX Security analysis of Vercel breach: https://www.ox.security/blog/vercel-context-ai-supply-chain-attack-breachforums/
+- CVE-2025-48985 (AI SDK file upload bypass): https://advisories.gitlab.com/pkg/npm/ai/
+- Vercel Sept 2025 npm supply chain response: https://vercel.com/blog/critical-npm-supply-chain-attack-response-september-8-2025
+- AI SDK release process (DeepWiki): https://deepwiki.com/vercel/ai/6.3-release-process-and-version-management
+- OpenAI Node SDK: https://github.com/openai/openai-node