Prompt engineering slowing you down? It’s time to try RAG and here's why.

Yuval Zukerman2024-02-26 | 4 min read

A data scientist sitting in a coffee shop working on his laptop

GenAI overall and LLMs promise productivity advancements and cost savings. There is so much promise and enough hype that management asked you to build GenAI PoCs. You tried prompt engineering and achieved some great results. But your prompts are getting too long, and you really can’t automate them. Results also seem stuck in time a few months ago. You also want your LLM to provide results that more closely meet what your colleagues actually need. If that's the case - it's time you started using RAG. Retrieval-augmented generation (RAG) helps LLMs produce relevant and up-to-date results. RAG even reduces the likelihood of hallucinations.

Why is RAG needed?

LLMs use massive datasets of text, code, and more for training. This training works to identify complex relationships between words. LLMs then use these relationships as the basis for text generation. Training data, however, doesn't necessarily guarantee the accuracy of LLM outputs. New data is published everywhere all the time. Any data missing from the training dataset will be absent from the LLM's 'knowledge.' Worse, prompting LLMs on information missing from their training dataset may produce hallucinations. LLMs may offer answers that have nothing to do with reality. Furthermore, since your corporate information was not part of the LLM's training dataset, it will be useless to prompt the LLM about any company-specific topics. And that's where RAG comes to the rescue.

How does RAG work?

RAG addresses these limitations by combining two key components:

Retrieval: This component acts like a search engine searching for information about your prompt. It scans the knowledge bases you specify, like corporate document collections or reams of drug research papers. This search will then return the documents that are relevant to the prompt.
Generation: In this second step, RAG passes the retrieved documents and the original prompt to an LLM. The LLM will then generate a response using its language capabilities and the retrieved information. The result will be grammatically correct, relevant, and more grounded in facts.

Let's dive a little deeper

RAG's retrieval step depends on specialized 'vector databases.' When you load data and documents into vector databases, they arrange them into what are known as document embeddings. Embeddings use a similar format to how LLMs store the word relationships we mentioned. Vector databases can search for information quickly and also offer results using embeddings. So, going step by step, RAG will follow this pattern:

Retrieval Step
- You query the vector database using your prompt.
- The vector database will provide the search results in the form of embeddings.
Generation Step
- You send the LLM the prompt and the embeddings.
- The LLM responds with results incorporating its existing 'knowledge' and your shared embeddings.

The Future of RAG

While RAG is a big step forward and offers great promise, its impact and effectiveness are still evolving. Researchers are still working on improving the relevance of results. They are making rapid progress in natural language understanding and reasoning. These advancements will improve LLMs' ability to include more external information seamlessly into their generated outputs.

Want to learn more? How about a webinar?

Ready to try RAG out? Then Domino's on-demand webinar, 'An Introduction to Retrieval-Augmented Generation (RAG),' is right for you. Hosted by Domino's John Alexander, the webinar will help you understand how to get started, what technical components you need, and finally, offer code samples. Better yet, you will see how Domino accelerates the entire effort. Don't wait - watch now!

Want to try things for yourself? Clone the code from our repo.

Yuval Zukerman

As Domino's content lead, Yuval makes AI technology concepts more human-friendly. Throughout his career, Yuval worked with companies of all sizes and across industries. His unique perspective comes from holding roles ranging from software engineer and project manager to technology consultant, sales leader, and partner manager.

Summary

RELATED TAGS

Generative AI Data Science LLM

Prompt engineering slowing you down? It’s time to try RAG and here's why.

Why is RAG needed?

How does RAG work?

Let's dive a little deeper

The Future of RAG

Want to learn more? How about a webinar?

Other posts you might be interested in

Governance meets scalable inference: Domino + Amazon SageMaker

Survey reveals AI's real-world priorities: GenAI isn't the whole story

Controlling the chaos of generative AI