AppSavvyBook a call
AI Transformation

Building an AI Chatbot That Actually Knows Your Business Data

How retrieval-augmented generation (RAG) lets an AI chatbot answer from your real data with citations - and why most business chatbots fail without it. The architecture, explained.

Will Driscoll10 min read

"We added an AI chatbot and it makes things up." We hear this constantly. The problem is almost never the chatbot - it is that the chatbot has no connection to your actual data, so it answers from the model's general training instead of from the truth about your business.

The fix is a technique called retrieval-augmented generation (RAG). This article explains what RAG is, why it is the difference between a chatbot that helps and one that embarrasses you, and the architecture we use to build chatbots that answer from real business data with citations.

Why ungrounded chatbots fail

A large language model is trained on a huge corpus of public text. Out of the box, it knows a lot about the world in general and nothing specific about your business. Ask it "what is our refund policy?" and it will produce a plausible-sounding refund policy - which has nothing to do with your actual policy. That is a hallucination, and in a business context it is dangerous.

The model is not broken. It is doing exactly what it was designed to do: generate plausible text. The problem is you have not given it the facts. RAG gives it the facts.

What RAG actually is

Retrieval-augmented generation works in three steps:

  1. Retrieve. When the user asks a question, the system searches your data for the most relevant pieces of information.
  2. Augment. It inserts those relevant pieces into the prompt, alongside the user's question.
  3. Generate. The model answers the question using the retrieved information, not its general training.

The result: the model answers from your actual data, and (when designed well) cites which document the answer came from. If the answer is not in your data, a well-built RAG system says "I do not have that information" instead of making something up.

That citation-and-honesty behaviour is what separates a business-grade chatbot from a liability.

The architecture, piece by piece

Here is what goes into a RAG chatbot that works.

1. The knowledge base

Your data, wherever it lives - documents, database records, help articles, policies, past tickets. The first step is identifying what the chatbot should know and getting access to it.

The quality of the chatbot is bounded by the quality of the knowledge base. Garbage in, garbage out. Part of building a good RAG system is curating what goes in.

2. Chunking

Documents get split into chunks - passages of a few hundred words. You retrieve chunks, not whole documents, because you want to give the model the specific relevant passage, not a 50-page PDF.

Chunking well is an underrated art. Chunk too small and you lose context; too large and retrieval gets imprecise. Good chunking respects the structure of the content (sections, paragraphs).

3. Embeddings and the vector store

Each chunk is converted into an embedding - a numerical representation of its meaning. These go into a vector store (we use Supabase pgvector or a dedicated vector database depending on scale).

When a question comes in, it is embedded the same way, and the vector store finds the chunks whose meaning is closest. This is semantic search - it matches on meaning, not keywords.

4. Retrieval and re-ranking

The vector store returns the top candidate chunks. For higher accuracy, a re-ranking step (often a smaller, faster model) reorders them by actual relevance to the question. This step meaningfully improves answer quality for harder questions.

5. The generation prompt

The retrieved chunks plus the user's question go into a carefully designed prompt that instructs the model to:

  • Answer only from the provided information
  • Cite which chunk each part of the answer came from
  • Say "I do not have that information" if the answer is not in the retrieved chunks

This prompt is where a lot of the engineering value lives. It is the difference between a chatbot that hallucinates and one that is honest.

6. The model

The actual LLM that generates the answer. We keep this model-agnostic via a routing layer so we can use the best model for the task and swap as better models ship.

7. The UI

The chat interface - which is often the easiest part, and the part platforms oversell. A nice chat UI on top of a bad retrieval system is still a bad chatbot. Get the retrieval right first.

Where this lives in your stack

For most businesses, the RAG system is a bounded code service that sits alongside your existing application:

  • Your app stays your app (whether it is on Bubble, a custom stack, or anything else)
  • The RAG service handles the chunking, embedding, retrieval, and generation
  • Your app calls the RAG service and renders the conversation
  • The knowledge base syncs from your real data sources

For Bubble apps, this is the same pattern as adding AI features to a Canvas app - keep the orchestration in code, let Bubble do the UI.

The things that make RAG hard

The architecture above sounds straightforward. The hard parts in practice:

  • Keeping the knowledge base fresh. Your data changes. The vector store needs to update when it does. This is an ongoing sync problem, not a one-time index.
  • Handling "I do not know" gracefully. The system has to recognise when it does not have the answer and say so, rather than retrieving the closest-but-wrong chunk and answering confidently.
  • Access control. If different users should see different data, retrieval has to respect that - a user should never get an answer based on a chunk they are not allowed to see.
  • Evaluation. Knowing whether the chatbot is actually answering correctly requires a test set of questions with known answers, run regularly. Without evaluation you are flying blind.

These are the parts that separate a weekend prototype from a production system. They are also where most "we added a chatbot" projects fall down.

How to know if you need RAG

You need RAG (not just a plain chatbot) if:

  • The chatbot needs to answer questions specific to your business, products, or customers
  • The answers must be accurate and grounded, not plausible-sounding
  • The information changes over time
  • Different users should see different information

You might not need RAG if:

  • The chatbot only needs to do general tasks (writing assistance, brainstorming) with no business-specific knowledge
  • The knowledge is small and static enough to fit directly in the prompt

What to do next

If you want a chatbot that actually knows your business data, book a 30-minute discovery call. We will look at your data sources and tell you what a grounded chatbot would take.

Read next: Choosing an AI model in 2026 and AI for professional services firms.

Got a Bubble or Canvas app you’d like a second pair of eyes on?

30-minute discovery call. We’ll look at your app live and tell you honestly what we’d do next.

Or grab the Bubble migration playbook PDF.