
RAG vs Fine-Tuning vs Long Context: Which, and When
Three ways to get a model to use your data, often confused. When retrieval, fine-tuning, or a big context window is the right call, and when to combine them.
Key takeaways
- Use RAG for knowledge that changes and needs citations, since the model's weights never change and you update a database you control.
- Fine-tuning teaches behavior, format, and tone, not facts, so putting your product catalog in weights makes answers fuzzier rather than more accurate.
- Long context fits one-off tasks over a single document, but you pay for every token on every call and models lose details buried in the middle.
- Mature systems stack all three: fine-tuning for behavior, retrieval for fresh facts, and the context window as the workspace where they meet.
The short answer first
If your data changes and you need the model to cite where an answer came from, use retrieval (RAG). If you need the model to behave a certain way, follow a format, or handle a narrow task more reliably, fine-tune it. If you have a one-off task over a document that fits in the window and you want to ship today, just put the whole thing in the context.
Most teams I talk to have these three mixed up. They fine-tune to teach the model facts (it doesn't really learn facts that way), or they dump a 200-page handbook into every prompt and then wonder why the bill tripled and the answers got worse. So before you pick, it helps to be clear on what each one actually does.
What each approach actually does
RAG: retrieval-augmented generation
RAG is a search step glued to the front of the model. At question time, you go find the relevant chunks of your data, hand them to the model along with the question, and ask it to answer using what you gave it. The model's weights never change. Your knowledge lives in a database you control.
Good at: data that changes (prices, policies, docs, tickets), answers that need a citation, and keeping different customers' data separate. You update a record, and the next answer reflects it. No retraining.
Bad at: questions that need a view across everything at once ("what are the top themes across 10,000 tickets?"), and anything where retrieval quality is poor. RAG is only as good as its search. If the right chunk never makes the top results, the model never had a chance. I've written more about that failure mode separately, but the short version is that retrieval, not the model, is usually what's broken.
Fine-tuning: changing the model's behavior
Fine-tuning continues training on your own examples so the model adjusts its weights. The right mental model is teaching style, format, and task behavior, not loading in facts. You fine-tune so the model always returns valid JSON in your schema, or matches your support team's tone, or classifies tickets into your 14 categories without a three-paragraph prompt every time.
Good at: consistent output format, a specific voice or style, narrow repetitive tasks where you have a few hundred to a few thousand good examples, and cutting prompt length (the behavior is baked in, so you stop paying for a giant instruction block on every call).
Bad at: facts. This is the single most common mistake. Fine-tuning on your product catalog does not make the model a reliable lookup for your product catalog. It might recite some of it, get details subtly wrong, and have no idea when your prices changed last week. Facts that move belong in retrieval, not in weights.
Long context: just put it in the prompt
Modern models take very large context windows, hundreds of thousands of tokens, sometimes more. The simplest possible approach is to skip retrieval and skip training, and paste the relevant material straight into the prompt.
Good at: speed of building (there is almost nothing to build), one-off or low-volume tasks, and cases where the relevant material genuinely fits and you want the model to reason over all of it together. Summarizing one contract, comparing two documents, analyzing a single long transcript.
Bad at: cost and recall at scale. You pay for every token on every call, so a big static document in every prompt gets expensive fast. And bigger is not automatically better. Models reliably lose details buried in the middle of a very long window, so stuffing everything in can actually lower answer quality even when the cost is fine. A large window is a convenience, not a filing system.
The comparison, side by side
| Dimension | RAG | Fine-tuning | Long context |
|---|---|---|---|
| Best for | Dynamic knowledge, lookups | Format, style, narrow tasks | One-off reasoning over a document |
| When facts change | Update the data, done | Retrain (slow, painful) | Re-paste the new version |
| Freshness | Real-time | Stale until retrained | As fresh as what you paste |
| Citations | Yes, you know the source | No | Weak, it saw the text but won't reliably attribute |
| Setup effort | Medium to high (pipeline, retrieval, evals) | Medium (need good labeled data) | Low (paste and go) |
| Cost shape | Per-query retrieval, modest tokens | Up-front training, cheaper per call | High per call, scales with prompt size |
| Scales to large corpora | Yes | Indirectly (changes behavior, not knowledge) | No, you hit window and cost limits |
| Time to first version | Days to weeks | Days to weeks | Hours |
A decision guide
Start with what you are actually trying to fix.
Your problem is knowledge. The model needs to answer from your data, and that data changes. Use RAG. This covers most internal-knowledge, customer-support, and document-Q&A products. If you need citations or per-tenant isolation, this is settled, it's RAG. If you want help building that pipeline well (chunking, hybrid search, reranking, evals), that is exactly the kind of work we do in RAG development.
Your problem is behavior. The model knows enough, but it won't consistently follow your format, tone, or task definition, and prompting only gets you part of the way. Fine-tune. A good sign you are ready: you have a few hundred clean examples of the input and the exact output you want.
Your problem is one document, right now. You have a single contract, report, or transcript and you need an answer today, at low volume. Put it in the context window and move on. Don't build a retrieval pipeline for something you'll run twice.
You are not sure. Default to long context to prototype and learn what "good" looks like, then move the knowledge part to RAG once volume or cost makes the prompt approach hurt. Reach for fine-tuning last, only after prompting and retrieval have plateaued.
They are not mutually exclusive
The framing as a three-way fight is misleading. In real systems these stack.
A mature setup often fine-tunes a model so it reliably produces the right format and follows instructions, uses RAG to feed it fresh, citable facts at query time, and uses the context window to hold the retrieved chunks plus a few examples. Each layer does the job it is actually good at. The fine-tune handles how the model behaves. Retrieval handles what it knows right now. The window is just the workspace where the two meet.
The point of separating them up front is so you put each concern in the right place: facts in retrieval, behavior in weights, and the immediate working set in the prompt.
The mistakes I see most
A few patterns are common enough to call out directly.
Fine-tuning to teach facts. It feels like it should work and it mostly doesn't. The model gets fuzzier, not more accurate, and it can never tell you it's out of date. Put facts in retrieval.
Dumping everything into context. A 300-page policy PDF in every prompt is not a knowledge base, it's a recurring bill with a recall problem. If you find yourself pasting the same large block on every call, you wanted retrieval.
Reaching for fine-tuning first. It's the heaviest tool, needs labeled data, and is the slowest to update. Try prompting and retrieval before you train anything.
Skipping retrieval evals. If you go with RAG, measure whether the right chunk actually shows up in your results before you blame the model for a bad answer. Most RAG failures are search failures wearing a model's clothes.
Where to land
Pick by the problem, not the hype. Knowledge that changes goes in RAG. Behavior that needs to be consistent goes in a fine-tune. A single document you need answered today goes in the context window. And when you are building something real, expect to combine them, with each technique doing only the part it's genuinely good at.