RAG with Python Cookbook Learn principles of RAG with LLM and agentic AI, with 120+ recipes (English Edition) (Deepak Dhyani) (z-library.sk, 1lib.sk, z-lib.sk)
Author: Deepak Dhyani
AI
No Description
📄 File Format:
PDF
💾 File Size:
2.7 MB
8
Views
0
Downloads
0.00
Total Donations
📄 Text Preview (First 20 pages)
ℹ️
Registered users can read the full content for free
Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.
📄 Page
1
(This page has no text content)
📄 Page
2
(This page has no text content)
📄 Page
3
RAG with Python Cookbook Learn principles of RAG with LLM and agentic AI, with 120+ recipes Deepak Dhyani www.bpbonline.com
📄 Page
4
First Edition 2026 Copyright © BPB Publications, India ISBN: 978-93-65895-735 All Rights Reserved. No part of this publication may be reproduced, distributed or transmitted in any form or by any means or stored in a database or retrieval system, without the prior written permission of the publisher with the exception to the program listings which may be entered, stored and executed in a computer system, but they can not be reproduced by the means of publication, photocopy, recording, or by any electronic and mechanical means. LIMITS OF LIABILITY AND DISCLAIMER OF WARRANTY The information contained in this book is true and correct to the best of author’s and publisher’s knowledge. The author has made every effort to ensure the accuracy of these publications, but the publisher cannot be held responsible for any loss or damage arising from any information in this book. All trademarks referred to in the book are acknowledged as properties of their respective owners but BPB Publications cannot guarantee the accuracy of this information.
📄 Page
5
www.bpbonline.com
📄 Page
6
Dedicated to To my wife, Shubhra, and my children, Antara and Atharv Your love, patience, and encouragement made this book possible
📄 Page
7
About the Author Deepak Dhyani is a senior technology leader with extensive experience in enterprise engineering, distributed systems, and intelligent automation. Over his 25+ years of experience, he has spearheaded product engineering, platform modernization, and cloud-native development across global organizations. His technical strengths include expertise in modern programming languages, frameworks, and patterns, required to build enterprise applications. He is well-versed with AI technologies required to design and build/integrate AI solutions to get the best business outcome. Deepak has led high-performance engineering teams, contributed to technology strategy, and built scalable solutions across different domains. This book reflects his focus on engineering expertise, architectural clarity, and practical techniques for building real-world optimized RAG systems.
📄 Page
8
About the Reviewers Venkata Kondepati is a seasoned technology leader with over two decades of experience driving digital transformation and cloud architecture initiatives across global enterprises. As former director of software engineering at S&P Global, he led multi-million-dollar modernization projects, including migrating 60+ applications to AWS cloud infrastructure and developing AI-driven data platforms that delivered 70% in cost savings. Currently serving as manager of data architecture and engineering at Ascentt, Venkat specializes in cloud- native architectures, generative AI implementation, and building high-performance engineering teams across distributed global organizations. He holds multiple certifications, including AWS Solutions Architect and PMP, and has completed advanced studies in data science and generative AI for business transformation. Debanshu Das is a senior software engineer and technical lead at Google, where he focuses on generative AI driven solutions for large- scale recommendation systems in the context of creative advertising on YouTube. His work spans generative AI, agentic workflows, large- scale recommendation engines, and cloud-native distributed systems. He brings prior industry experience from Oracle and Apple, enabling him to bridge theoretical research and production-grade system design. An IEEE Senior Member, his research has been accepted at leading venues including AAAI and WSDM. He holds a master’s degree from Carnegie Mellon University.
📄 Page
9
Acknowledgement Writing this book has been an immensely rewarding journey, and I am deeply grateful to the many people who supported me along the way. First and foremost, I extend my heartfelt thanks to my family. To my wife, Shubhra, for her unwavering encouragement, patience, and constant belief in my work, and to my children, Antara and Atharv, for bringing joy, balance, and inspiration into my life. Their support made it possible for me to dedicate long hours to researching, writing, and refining this book. Special thanks to the engineers, developers, and practitioners who are building the next generation of AI-powered systems. Their hard work, challenges, and contributions in the field of AI helped me refine the examples, recipes, and practical solutions presented in this book. Finally, I am grateful to the editorial and publishing team for their guidance, patience, and commitment in maintaining high technical standards throughout the process. Their attention to detail and thoughtful review ensured that this book delivers meaningful value to its readers. To everyone who contributed directly or indirectly, thank you for helping in bringing this work to life.
📄 Page
10
Preface The rapid evolution of large language models (LLMs) has transformed how organizations retrieve, process, and utilize information. As enterprises increasingly adopt AI to enhance decision-making and automate complex workflows, retrieval-augmented generation (RAG) has emerged as one of the most practical and powerful architectures for building trustworthy, context-aware AI systems. In this landscape, engineering teams require not only conceptual understanding but also hands-on skills to build scalable, production-ready RAG systems. This book provides a comprehensive, end-to-end guide to RAG engineering using Python. Each chapter is designed to give readers both theoretical insight and practical implementation experience. We begin with the foundational principles of RAG architecture before examining the essential components of any retrieval pipeline i.e. document loaders, text-splitting strategies, embeddings, and vector stores. You will explore different vector retrieval approaches, learn techniques for improving retrieval efficiency, and understand how LLMs generate grounded responses based on retrieved evidence. There are dedicated chapters focusing on prompt engineering for RAG, advanced search strategies, and building multi-step RAG pipelines with chains. The final chapter introduces agentic retrieval, showing how autonomous agents can adaptively decide what and how to retrieve, enabling more intelligent and dynamic RAG systems. This book is written for engineers, architects, data scientists, and technical practitioners who want to build robust and production-grade RAG applications. By the end of this book, you will have the skills to design complete RAG pipelines, optimize retrieval workflows, integrate LLMs effectively, and implement advanced agentic retrieval strategies using modern Python libraries.
📄 Page
11
Chapter 1: Foundation of Retrieval-augmented Generation—RAG enhances AI systems by combining external knowledge with the generative power of large language models. This chapter introduces the core steps of the RAG pipeline, such as loading documents, splitting them into useful chunks, creating embeddings for vector search, and storing them in vector databases. It also explains how relevant information is retrieved and used by LLMs to generate accurate, context-aware responses. By the end, the readers will understand each component’s role in a complete RAG workflow. Chapter 2: Document Loaders for RAG Pipelines—This chapter explores how to load documents effectively for RAG pipelines, focusing on the most common source types such as files, web pages, and structured data. It also explains how metadata is extracted and managed to improve retrieval accuracy and how preprocessing and text normalization prepare content for downstream steps. Chapter 3: Document Splitting Techniques—This chapter focuses on the essential techniques for splitting documents into meaningful chunks for effective retrieval. It covers n-gram and regex-based splitting, topic-driven segmentation, and methods tailored for markdown, HTML tags, tables, and page-based formats. The readers will also learn how to group or split documents using metadata and apply time-based segmentation for chronological content. The chapter concludes with guidance on designing custom separators to handle specialized document structures. Chapter 4: Embedding Strategies for Vector Retrieval—This chapter explores how text is transformed into vector embeddings for efficient semantic retrieval. The readers will learn to embed individual texts and large document sets, use FAISS for high-performance indexing, and apply offline or custom embedding models in LangChain. The chapter also covers batching techniques, persistent storage, metadata-enhanced filtering, and strategies like summary-only embeddings and pre-normalization for noise resistance. Chapter 5: Vector Stores for Semantic Retrieval—This chapter introduces vector stores as the backbone of semantic retrieval in RAG systems. The readers will learn to create, persist, and query FAISS and ChromaDB stores, apply hybrid search with metadata filtering, and
📄 Page
12
efficiently index documents using batch and async workflows. The chapter also covers integrating vector stores with document splitters, performing semantic search on auto-summarized chunks, evaluating retrieval quality, and using multi-vector strategies that combine dense and sparse representations for improved accuracy. Chapter 6: Efficient Retrieval from Vector Store—This chapter focuses on techniques for achieving fast and accurate retrieval from vector stores. It covers building approximate nearest-neighbor indexes, optimizing chunk sizes, and caching frequent queries for improved latency. The readers will learn how to re-rank results with cross-encoders, tune retrieval parameters, and use dimensionality reduction for compact storage. The chapter also explores distributed indexing, topic-based partitioning, adaptive chunking strategies, and managing cold versus hot storage to balance performance and scalability in real-world RAG pipelines. Chapter 7: Response Generation with LLM in RAG Systems—This chapter explores how large language models generate high-quality, context- aware responses within RAG systems. It covers direct answer generation, structured outputs, and chain-of-thought guided reasoning for improved transparency. The readers will learn techniques for cited and confidence- scored responses, hybrid generation using multiple vectors or prompts, and methods for critical verification. The chapter also explains query decomposition, context enrichment, and progressive disclosure to produce accurate, reliable, and well-grounded responses across diverse use cases. Chapter 8: Prompt Engineering for RAG Systems—This chapter focuses on prompt engineering techniques tailored specifically for RAG systems. It explores context-grounded prompting, query reformulation, and multi- prompt ensembles to improve retrieval relevance and response quality. The readers will learn methods for confidence-aware and cited response prompting, along with strategies for generating structured outputs. The chapter also covers summarization prompts and evidence-highlighting techniques, enabling you to design precise, reliable, and traceable prompts that strengthen both retrieval performance and LLM-generated answers. Chapter 9: Effective Search for RAG Systems—This chapter explains effective search strategies that enhance retrieval accuracy and relevance in RAG systems. It covers dense retrieval, BM25 keyword search, and hybrid
📄 Page
13
methods that combine semantic and lexical signals. The readers will explore query expansion, semantic filtering, and hierarchical retrieval for multi- level search. The chapter will further examine optimal chunking strategies, batch retrieval for efficiency, and metadata or time-aware retrieval techniques, enabling the readers to build highly precise, performant, and context-sensitive search pipelines. Chapter 10: Implementing RAG with Chains—This chapter explores how to implement RAG using specialized chains that orchestrate retrieval and reasoning. You will learn to build question-answering, conversational, summarization, and cited-response chains, along with document-stuffing and tool-augmented RAG workflows. The chapter also introduces source- aware chains, hybrid dense-sparse retrieval chains, metadata-filtered self- query chains, and re-ranking strategies. By mastering these chaining patterns, you can design modular, reliable, and intelligent RAG pipelines tailored to diverse application needs. Chapter 11: Agentic RAG with Dynamic Retrieval—This chapter introduces agentic RAG, where autonomous agents dynamically control retrieval, reasoning, and response generation. The readers will explore self- querying agents, tool-augmented task agents, and context-aware conversational agents that refine queries in real time. The chapter covers dynamic re-ranking, adaptive summarization, chain-of-thought retrieval, hybrid dense–sparse strategies, and time-aware or streaming retrieval. By the end, the readers will understand how intelligent agents orchestrate flexible, optimized retrieval pipelines for complex, evolving user needs.
📄 Page
14
Code Bundle and Coloured Images Please follow the link to download the Code Bundle and the Coloured Images of the book: https://rebrand.ly/970ecb The code bundle for the book is also hosted on GitHub at https://github.com/bpbpublications/RAG-with-Python-Cookbook. In case there’s an update to the code, it will be updated on the existing GitHub repository. We have code bundles from our rich catalogue of books and videos available at https://github.com/bpbpublications. Check them out! Errata We take immense pride in our work at BPB Publications and follow best practices to ensure the accuracy of our content to provide with an indulging reading experience to our subscribers. Our readers are our mirrors, and we use their inputs to reflect and improve upon human errors, if any, that may have occurred during the publishing processes involved. To let us maintain the quality and help us reach out to any readers who might be having difficulties due to any unforeseen errors, please write to us at : errata@bpbonline.com Your support, suggestions and feedbacks are highly appreciated by the BPB Publications’ Family. At www.bpbonline.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive
📄 Page
15
discounts and offers on BPB books and eBooks. You can check our social media handles below: Instagram Facebook Linkedin YouTube Get in touch with us at: business@bpbonline.com for more details. Piracy If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at business@bpbonline.com with a link to the material. If you are interested in becoming an author If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit www.bpbonline.com. We have worked with thousands of developers and tech professionals, just like you, to help them share their insights with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea. Reviews Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions. We at BPB can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
📄 Page
16
For more information about BPB, please visit www.bpbonline.com. Join our Discord space Join our Discord workspace for latest updates, offers, tech happenings around the world, new releases, and sessions with the authors: https://discord.bpbonline.com
📄 Page
17
Table of Contents 1. Foundation of Retrieval-augmented Generation Introduction Structure Objectives Software requirements Load document Recipe 1 Split document Recipe 2 Recipe 3 Recipe 4 Recipe 5 Embeddings Recipe 6 Recipe 7 Store Recipe 8 Recipe 9 Retrieval Recipe 10 Recipe 11 Recipe 12 Recipe 13 Generation
📄 Page
18
Recipe 14 Conclusion 2. Document Loaders for RAG Pipelines Introduction Structure Objectives Core components Supported document types Software requirements Document loading from common source types Recipe 15 Recipe 16 Recipe 17 Recipe 18 Recipe 19 Recipe 20 Metadata extraction and management Recipe 21 Preprocessing and text normalization Recipe 22 Recipe 23 Bulk loading with directory-based ingestion Recipe 24 Recipe 25 Custom document loader Recipe 26 Conclusion 3. Document Splitting Techniques
📄 Page
19
Introduction Structure Objectives Software requirements N-gram text splitter Recipe 27 Topic-based document splitting Recipe 28 Regex-based splitting Recipe 29 Splitting Markdown text Recipe 30 Grouping and splitting documents by metadata Recipe 31 Time-based splitting Recipe 32 HTML tag-based splitting Recipe 33 Table-based splitting Recipe 34 Page-based splitting Recipe 35 Custom separator splitting Recipe 36 Recipe 37 Recipe 38 Recipe 39 Recipe 40 Conclusion
📄 Page
20
4. Embedding Strategies for Vector Retrieval Introduction Structure Objectives Software requirements Convert text to embeddings Recipe 41 Embed a list of documents into vectors Recipe 42 Embeddings using FAISS Recipe 43 Embedding models for offline use Recipe 44 Customize the embedding function in LangChain Recipe 45 Batch embedding of large documents efficiently Recipe 46 Embed only once and store persistently Recipe 47 Embedding and metadata for better filtering Recipe 48 Embed only summaries Recipe 49 Noise-resistant embeddings with pre-normalization Recipe 50 Embedding chunk graphs Recipe 51 Conclusion 5. Vector Stores for Semantic Retrieval
The above is a preview of the first 20 pages. Register to read the complete e-book.
Recommended for You
Loading recommended books...
Failed to load, please try again later