๐๐ง๐๐๐ซ๐ฌ๐ญ๐๐ง๐๐ข๐ง๐ ๐๐๐ ๐๐ซ๐จ๐ฆ ๐๐๐ซ๐๐ญ๐๐ก
Letโs understand how RAG works from scratch.
RAG stands for Retrieval Augmented Generation. RAG helps to reduce hallucinations in LLMs by providing them with relevant contexts from external knowledge sources.
Understanding how RAG works from scratch is important for AI/ML Engineers.
๐๐จ๐ฐ ๐๐จ๐๐ฌ ๐๐๐ ๐ฐ๐จ๐ซ๐ค?
RAG involves four steps, namely indexing, retrieval, augmentation, and generation. The indexing step is done only once, while the retrieval, augmentation, and generation steps are repeated for each user query.
๐๐ง๐๐๐ฑ๐ข๐ง๐
This step starts with extracting the content of raw documents (parsing) and then breaking them up into smaller pieces called chunks.
An embedding model turns these chunks into vector embeddings, which are then stored in a vector database.
1๏ธโฃ Parse: Extract content from the documents (web pages, PDFs, etc.).
2๏ธโฃ Chunking: Split the extracted content into smaller and meaningful segments called chunks.
3๏ธโฃ Encode: The embedding model converts chunks into embeddings.
4๏ธโฃ Store: For quick and efficient similarity search, save chunk embeddings in a vector database.
RAG from Scratch (Webinar)
In this webinar, you will understand and implement RAG from scratch without using any frameworks like LangChain or Llama Index.
This webinar covers
What is RAG?
How does RAG address LLM limitations?
How does RAG work? ( Indexing -> Retrieval -> Augmentation -> Generation)
RAG implementation from scratch without any frameworks
Practical RAG Applications
RAG Limitations
โก๏ธ Register
๐๐๐ญ๐ซ๐ข๐๐ฏ๐๐ฅ
This step starts with encoding the user query, i.e., the user query is transformed into an embedding vector using the same embedding model used in the indexing step.
The semantic search feature in the vector database uses the query embedding to find and return the most relevant chunks.
1๏ธโฃ Query: The user asks a query.
2๏ธโฃ Encode: The embedding model converts the user query into an embedding.
3๏ธโฃ Semantic Search: Most relevant chunks are retrieved by comparing the query embedding against the chunks embeddings saved in the vector database.
4๏ธโฃ Relevant Chunks: The most relevant chunks are returned, which serve as context for the LLM.
๐๐ฎ๐ ๐ฆ๐๐ง๐ญ๐๐ญ๐ข๐จ๐ง
In this step, retrieved relevant chunks are combined to form a context. The context is then combined with the query and instructions to arrive at the LLM prompt.
1๏ธโฃ Combine: Texts from relevant chunks are concatenated into a single string called context.
2๏ธโฃ Augment: The context is combined with the query and instructions to obtain the LLM prompt.
๐๐๐ง๐๐ซ๐๐ญ๐ข๐จ๐ง
In this step, the prompt having the user query, instructions, and context is given to the LLM. The LLM processes the prompt and then generates a response grounded in the given context.
1๏ธโฃ Feed: The prompt, consisting of a query and context, is given to the LLM.
2๏ธโฃ Generate: The LLM generates the response grounded in the given context.




Thanks for the simple walkthrough for the good ๐