AI Product Development

RAG vs Fine-Tuning vs Prompting: Which Approach for Your AI Product

A decision framework for the three ways to make an AI model do what you want: prompting, retrieval-augmented generation, and fine-tuning. When each fits, and when to combine them.

Will Driscoll19 May 20269 min read

When you want an AI model to do something specific for your product, you have three tools: prompting, retrieval-augmented generation (RAG), and fine-tuning. Teams reach for fine-tuning first because it sounds the most serious. It's usually the wrong first choice.

This article is a decision framework for which approach fits your problem - when to use each, when to combine them, and why the order you should try them is almost the reverse of the order people expect.

The three approaches in one line each

Prompting - tell the model what to do in the prompt. No training, instant iteration.
RAG - retrieve relevant information and put it in the prompt so the model answers from your data. No training, dynamic knowledge.
Fine-tuning - train the model on examples so it learns a behaviour or style. Requires data, training, and ongoing maintenance.

Start with prompting

For most problems, a well-engineered prompt with a capable model gets you most of the way. Modern models are extremely good at following detailed instructions. Before reaching for anything heavier, exhaust prompt engineering:

Clear instructions and role definition
Examples in the prompt (few-shot)
Structured output formats
Step-by-step reasoning where it helps

Prompting iterates in seconds. You change the prompt, you see the result. No training runs, no datasets, no deployment of a custom model. The fastest path to "is this even possible?" is always a good prompt.

If prompting alone solves your problem, stop. You're done. Most teams skip past this too quickly.

Add RAG when the model needs your data

Prompting hits its limit when the model needs to know things it wasn't trained on - your documents, your product catalogue, your customer's specific history, anything proprietary or recent.

That's RAG. You retrieve the relevant information and include it in the prompt, so the model answers from your actual data with citations instead of hallucinating.

Use RAG when:

The answer depends on your specific, private, or current data
The knowledge changes over time (RAG updates instantly; fine-tuning would need re-training)
You need citations / explainability (RAG can show which document the answer came from)
The knowledge base is large (too big to fit in a prompt)

RAG is the workhorse of business AI products. The overwhelming majority of "make the AI know our stuff" problems are RAG problems, not fine-tuning problems.

Reach for fine-tuning only when prompting and RAG can't get there

Fine-tuning trains the model on examples so it internalises a behaviour. It's powerful but it's also the most expensive and least flexible option: you need a quality dataset, you run training, you maintain the fine-tuned model, and you re-train when things change.

Fine-tuning is the right tool for a narrow set of problems:

Consistent style or format that prompting can't reliably enforce - a very specific tone, structure, or output convention, demonstrated across many examples
A specialised task where you have lots of input/output examples and a smaller fine-tuned model can match a bigger general model at lower cost and latency
Behaviour that's hard to describe but easy to demonstrate - where showing 1,000 examples works better than writing instructions

Fine-tuning is almost never the right tool for "make the model know our data" - that's RAG. The classic mistake is fine-tuning on a corpus of documents hoping the model will "learn" them. It doesn't work well; the model learns the style of the documents, not reliable recall of their contents. Use RAG for knowledge.

The decision framework

Your need	Reach for
Get the model to follow instructions / do a task	Prompting
Answer from your private or current data, with citations	RAG
Enforce a consistent style/format prompting can't	Fine-tuning
Reduce cost/latency on a narrow high-volume task	Fine-tuning (smaller model)
Make the model "know" a body of knowledge	RAG (not fine-tuning)
All of the above	Combine: fine-tune for behaviour, RAG for knowledge, prompt for the task

Combining them

The approaches aren't mutually exclusive. A sophisticated AI product often uses all three:

Prompting structures every interaction
RAG grounds answers in the product's data
Fine-tuning (occasionally) tunes a model for a specific high-volume task or a consistent output style

But you build up to this. Start with prompting. Add RAG when you need your data. Consider fine-tuning only when you've proven prompting and RAG can't get there and you have the data and volume to justify it.

The cost angle

The three approaches have very different cost profiles:

Prompting - cost is per call, no setup. Long prompts cost more per call (token economics).
RAG - cost is per call plus the embedding/retrieval infrastructure. Generally efficient.
Fine-tuning - cost is training (upfront, recurring on re-train) plus inference. Can be cheaper per call at high volume with a smaller model, but the maintenance overhead is real.

For most products, prompting + RAG is both the cheapest and the most flexible. Fine-tuning earns its place only when the specific economics work out.

What to do next

If you're not sure which approach fits your AI product, book a 30-minute discovery call. We'll look at your specific use case and tell you the simplest approach that gets you there - usually simpler than you expect.

Got a Bubble or Canvas app you’d like a second pair of eyes on?

30-minute discovery call. We’ll look at your app live and tell you honestly what we’d do next.

Book a discovery call See how we rescue Canvas apps →

Or grab the Bubble migration playbook PDF.