
Vendor Lock-In with LLMs: How to Keep Providers Swappable
Prices change, models get deprecated, and providers face export limits overnight. How to architect LLM features so you can switch providers without a rewrite.
Key takeaways
- Put a thin adapter between your application and the model provider so swapping providers becomes a config change instead of a rewrite.
- Portability is a per-feature spectrum, with core revenue-touching features leaning portable and throwaway internal tools fine to keep locked in.
- Keep prompts written as plain instructions and own a provider-agnostic eval suite so you can prove a switch is safe before pulling the trigger.
- Confine provider-specific features like fine-tunes or proprietary tool formats inside one adapter rather than spreading them across your codebase.
If you only do one thing, put a thin adapter between your application and the model provider, and route every call through it. That single decision is what lets you swap providers in a config change instead of a rewrite. Everything else in this post is about where to draw that line, what to keep neutral on each side of it, and which kinds of lock-in are actually worth accepting.
I have watched teams get surprised three ways in the last two years: a price increase that quietly doubled a feature's unit economics, a model deprecation with a 90-day clock, and an export-control headline that made legal nervous about a provider overnight. None of those were predictable. All of them were survivable if the team had kept its options open.
Why lock-in bites harder with LLMs than with most SaaS
A database vendor rarely deprecates the engine you are running. A model provider does it on a schedule. The pace of change at the model layer is the whole reason this problem feels different.
Here is the short list of things that move under your feet:
- Price changes. Input and output token pricing gets cut and raised. A feature that pencils out at one price can stop penciling at another, and you do not control the timing.
- Model deprecations. The exact model snapshot you tuned prompts against gets retired. You get a window, not a veto. Your carefully tuned behavior may shift on the replacement.
- Export controls and policy. Government rules and provider policy can restrict who you serve or where you run inference. This is not hypothetical for anyone with customers or data residency requirements outside the US.
- Rate limits and quota. Your account tier caps throughput. During a traffic spike or a provider-wide demand surge, your ceiling is somebody else's business decision.
- Outages. Providers go down. When they do and you have no fallback, your feature goes down with them, and the postmortem is uncomfortable.
The common thread is that the things most likely to hurt you are decisions made inside the provider, not inside your codebase. You cannot prevent them. You can decide in advance how much they cost you.
The portability spectrum, not a binary
People frame this as "locked in" versus "portable." It is really a spectrum, and you choose a point on it per feature. A throwaway internal summarizer can sit at the locked-in end and nobody cares. Your core product feature that touches revenue belongs much further toward portable.
| Approach | Lock-in risk | Switching cost | What you give up |
|---|---|---|---|
| Direct SDK calls scattered through the app | High | Weeks of refactoring across many files | Nothing now, everything later |
| Single internal client wrapper (one file owns the SDK) | Medium | Days; one file to change, plus prompt cleanup | A little boilerplate |
| Adapter layer with a provider-neutral interface | Low | Hours to add a provider; config to switch | Some provider-specific features become opt-in |
| External model gateway (provider/model routing) | Low | Config change, no deploy | An extra hop, a dependency, and a bill |
| Multi-provider with live failover and evals | Lowest | Built in from day one | Real engineering time and ongoing test cost |
Most teams should land on rows three or four for anything that matters. Row five is for features where an outage is a board-level event. Rows one and two are fine for experiments and internal tools, and pretending otherwise just slows you down.
The adapter layer is the load-bearing wall
The pattern is boring and that is the point. Define an interface in your own terms, then write a small adapter per provider that satisfies it. Your application code never imports a provider SDK directly.
// Your terms, not the provider's.
interface ChatModel {
complete(input: {
system: string;
messages: { role: "user" | "assistant"; content: string }[];
maxTokens: number;
}): Promise<{ text: string; usage: { input: number; output: number } }>;
}
// One adapter per provider. The SDK only ever appears here.
class AnthropicAdapter implements ChatModel {
async complete(input) {
// map our neutral shape to the provider call, and back
}
}
A few rules keep this honest. The interface speaks your domain, not the union of every provider's API. If you let the abstraction grow to expose every provider knob, you have rebuilt lock-in with extra steps. When two providers disagree about a concept, pick the simpler model and make adapters do the translation. And keep the adapter dumb: no business logic, just mapping in and out.
This is most of the work. If your team is building features from scratch and wants this structured correctly the first time, that interface design is exactly the kind of decision we sweat in LLM application development, because getting the boundary right early is far cheaper than peeling SDK calls out of forty files later.
Gateways: the same idea, one layer out
A model gateway moves the routing decision out of your binary and into a service. You call one endpoint with a "provider/model" string like anthropic/claude or openai/gpt, and the gateway handles auth, routing, retries, and often a unified billing and logging view. You can run an open-source one yourself or use a hosted product.
The appeal is real. You change a model without a deploy, you get one place to watch spend and latency, and failover can live in the gateway instead of your code. The honest cost is a new dependency in your hot path, an extra network hop, and a vendor whose own availability now matters to you. A gateway reduces provider lock-in and quietly introduces gateway lock-in. That trade is usually worth it at scale and overkill for a single feature.
If your bet is heavily on one provider's strongest models and you want that done well rather than abstracted into mush, a focused engagement like Claude integration can give you the best of that provider while still keeping the call site behind your own interface, so the door stays open.
Keep prompts and evals provider-neutral
The abstraction layer fails you if your prompts are secretly tuned to one model's quirks. Two habits keep prompts portable.
First, write prompts as plain instructions, not as exploits of a specific model's formatting tics. The more your prompt depends on undocumented behavior, the more it breaks on the next provider, and frankly on the next version of the same provider.
Second, own your evals and keep them provider-agnostic. An eval suite that runs the same test cases against any model behind your interface is the thing that tells you whether a switch is safe. Without it, "swappable in principle" means "untested in practice," and you will not pull the trigger when you need to because you cannot prove the new model behaves. Evals are the difference between a portability story you believe and one you can act on under pressure.
Store prompts as data, version them, and tag which model and version each was validated against. When a model gets deprecated, you want to rerun evals against the replacement and see the deltas, not rediscover your prompts by hand.
The features that quietly trap you
Some provider features are genuinely great and also one-way doors. They are worth using with eyes open. The question is never "is this feature good," it is "what does leaving cost me later."
| Provider-specific feature | Why it is tempting | The trap |
|---|---|---|
| Proprietary tool/function-calling formats | Cleaner than rolling your own | Tool definitions and parsing get rewritten on switch |
| Hosted prompt caching and context tricks | Real cost and latency wins | Pricing math and behavior assume that provider |
| Provider-managed RAG or file stores | Less infra to run | Your data and retrieval logic live in their walls |
| Fine-tunes and custom model variants | Better task performance | Not portable at all; you start over elsewhere |
| Structured-output and schema modes | Reliable parsing | Each provider does it differently enough to hurt |
You do not have to avoid these. You have to confine them. Use a proprietary tool-calling format, but normalize tool definitions to your own shape behind the adapter so the application never sees the provider's version. Use prompt caching, but treat its savings as a bonus rather than a load-bearing assumption in your pricing. The goal is that adopting a trap is a local decision inside one adapter, not a commitment spread across your codebase.
The real cost of portability
Let me be straight about the bill, because the abstraction-everything crowd tends to skip it. Portability is not free, and the price is paid in capability.
When you design to a neutral interface, you target the intersection of what providers offer, not the union. You skip the cool provider-specific feature that would have shaved 30% off latency, because it does not generalize. You write and maintain adapters and an eval harness that produce no customer-visible feature. And you carry a small ongoing tax: every new provider capability has to be evaluated for whether it fits your interface or breaks your neutrality.
That is a genuine cost. For a feature where the provider's edge is the product, paying it can be the wrong call.
So when is lock-in worth it?
Often, actually. Here is the test I use.
Accept lock-in when the provider-specific capability is the reason the feature is good, when switching is cheap relative to the upside because the surface is small, or when the feature is low-stakes enough that an outage or price bump is an annoyance rather than an incident. A clever fine-tune that makes your flagship feature noticeably better is worth being stuck on if that feature is your moat.
Pay for portability when the feature touches revenue or core workflows, when an outage would be a serious incident, when your unit economics are thin enough that a price change changes the business case, or when compliance and data residency mean you might be forced to move on short notice.
| Signal | Lean locked-in | Lean portable |
|---|---|---|
| Feature criticality | Internal or experimental | Core product, revenue-touching |
| Cost of an outage | Annoyance | Incident |
| Margin sensitivity to token price | Comfortable | Thin |
| Provider edge for this task | Decisive | Marginal |
| Compliance pressure to relocate | None | Real |
The mistake is not choosing lock-in. The mistake is choosing it by accident, feature after feature, until one day a 90-day deprecation notice lands and you discover the decision was made for you across thirty files.
What I would actually do
If I were standing up LLM features today, I would put an adapter in front of providers on day one, because it costs almost nothing early and a fortune late. I would keep prompts neutral and version them, and I would build a small eval harness before I needed it, since that harness is what converts "we could switch" into "we can switch this week." I would let high-value features lean on one provider's strengths on purpose, confined behind an adapter, with that choice written down so the next engineer knows it was deliberate. And I would reach for a gateway only once I had more than a couple of features or real multi-provider traffic to justify the extra hop.
Keep the door unlocked where it is cheap to do so, walk through the provider-specific door on purpose where the prize is worth it, and write down which doors you locked behind you. That is the whole discipline.