Essential GraphRAG Knowledge Graph-Enhanced RAG (Tomaž Bratanic, Oskar Hane) (Z-Library)

Author: Tomaž Bratanic, Oskar Hane

技术

No Description

📄 File Format: PDF
💾 File Size: 2.1 MB
63
Views
0
Downloads
0.00
Total Donations

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

📄 Page 1
M A N N I N G Tomaž BrataniË Oskar Hane Foreword by Paco Nathan Knowledge Graph-Enhanced RAG
📄 Page 2
How Retrieval-Augmented Generation (RAG) Works Question Ask Retrieval Augmented generation LLM Generated answer based on provided documents Question + relevant documents Generate answer Specific (private) knowledge base Smart search Smart lookup Relevant documents Instead of relying on the LLM’s internal knowledge, relevant information is retrieved from a knowledge base and provided as context to the LLM. This boosts accuracy by grounding the LLM’s response in factual, up-to-date information.
📄 Page 3
Essential GraphRAG
📄 Page 4
(This page has no text content)
📄 Page 5
Essential GraphRAG KNOWLEDGE GRAPH–ENHANCED RAG TOMAŽ BRATANIČ OSKAR HANE FOREWORD BY PACO NATHAN MANN I NG SHELTER ISLAND
📄 Page 6
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2025 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. The authors and publisher have made every effort to ensure that the information in this book was correct at press time. The authors and publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause, or from any usage of the information herein. Manning Publications Co. Development editor: Ian Hough 20 Baldwin Road Technical editor: Arturo Geigel PO Box 761 Review editor: Kishor Rit Shelter Island, NY 11964 Production editor: Kathy Rossland Copy editor: Kari Lucke Proofreader: Katie Tennant Technical proofreader: Jerry Kuch Typesetter: Dennis Dalinnik Cover designer: Marija Tudor ISBN: 9781633436268 Printed in the United States of America
📄 Page 7
v brief contents 1 ■ Improving LLM accuracy 1 2 ■ Vector similarity search and hybrid search 17 3 ■ Advanced vector retrieval strategies 30 4 ■ Generating Cypher queries from natural language questions 45 5 ■ Agentic RAG 56 6 ■ Constructing knowledge graphs with LLMs 70 7 ■ Microsoft’s GraphRAG implementation 88 8 ■ RAG application evaluation 116 appendix A ■ The Neo4j environment 127
📄 Page 8
contents foreword ix preface xi acknowledgments xii about this book xiv about the authors xvii about the cover illustration xviii 1 Improving LLM accuracy 1 1.1 Introduction to LLMs 2 1.2 Limitations of LLMs 5 Knowledge cutoff problem 5 ■ Outdated information 6 Pure hallucinations 6 ■ Lack of private information 7 1.3 Overcoming the limitations of LLMs 9 Supervised finetuning 9 ■ Retrieval-augmented generation 10 1.4 Knowledge graphs as the data storage for RAG applications 14 2 Vector similarity search and hybrid search 17 2.1 Components of a RAG architecture 18 The retriever 18 ■ The generator 20vi
📄 Page 9
CONTENTS vii2.2 RAG using vector similarity search 20 Application data setup 21 ■ The text corpus 21 ■ Text chunking 21 ■ Embedding model 22 ■ Database with vector similarity search function 23 ■ Performing vector search 24 Generating an answer using an LLM 26 2.3 Adding full-text search to the RAG application to enable hybrid search 27 Full-text search index 27 ■ Performing hybrid search 27 2.4 Concluding thoughts 29 3 Advanced vector retrieval strategies 30 3.1 Step-back prompting 34 3.2 Parent document retriever 36 Retrieving parent document strategy data 41 3.3 Complete RAG pipeline 43 4 Generating Cypher queries from natural language questions 45 4.1 The basics of query language generation 46 4.2 Where query language generation fits in the RAG pipeline 47 4.3 Useful practices for query language generation 47 Using few-shot examples for in-context learning 47 ■ Using database schema in the prompt to show the LLM the structure of the knowledge graph 48 ■ Adding terminology mapping to semantically map the user question to the schema 51 ■ Format instructions 51 4.4 Implementing a text2cypher generator using a base model 52 4.5 Specialized (finetuned) LLMs for text2cypher 54 4.6 What we’ve learned and what text2cypher enables 55 5 Agentic RAG 56 5.1 What is agentic RAG? 57 Retriever agents 57 ■ The retriever router 58 Answer critic 58 5.2 Why do we need agentic RAG? 59
📄 Page 10
CONTENTSviii5.3 How to implement agentic RAG 59 Implementing retriever tools 59 ■ Implementing the retriever router 62 ■ Implementing the answer critic 66 ■ Tying it all together 68 6 Constructing knowledge graphs with LLMs 70 6.1 Extracting structured data from text 71 Structured Outputs model definition 73 ■ Structured Outputs extraction request 78 ■ CUAD dataset 79 6.2 Constructing the graph 81 Data import 82 ■ Entity resolution 84 ■ Adding unstructured data to the graph 85 7 Microsoft’s GraphRAG implementation 88 7.1 Dataset selection 89 7.2 Graph indexing 90 Chunking 90 ■ Entity and relationship extraction 92 Entity and relationship summarization 96 ■ Community detection and summarization 100 7.3 Graph retrievers 103 Global search 104 ■ Local search 109 8 RAG application evaluation 116 8.1 Designing the benchmark dataset 118 Coming up with test examples 118 8.2 Evaluation 121 Context recall 121 ■ Faithfulness 121 ■ Answer correctness 122 ■ Loading the dataset 123 ■ Running evaluation 123 ■ Observations 124 8.3 Next steps 125 appendix The Neo4j environment 127 references 151 index 153
📄 Page 11
foreword In Essential GraphRAG, Tomaž and Oskar demonstrate how to implement a GraphRAG system from scratch, without relying on existing frameworks. They pull back the cur- tain, revealing the code behind contemporary AI applications. The book covers major GraphRAG innovations through worked examples you can code and run. Exercises explore nuances and alternatives, with references to primary sources on arXiv. Start- ing with simple RAG patterns, chapters progress through GraphRAG techniques to agentic workflows. By working through these coding examples, reading referenced articles, and solv- ing exercises, you’ll learn  How RAG improves large language model accuracy by retrieving external data  How knowledge graphs extend RAG for more structured and precise informa- tion retrieval  How to use query rewriting techniques and strategies for embedding and docu- ment chunking, adapted for various use cases  How to build agentic systems for complex scenarios At every step, Tomaž and Oskar guide you on improving retrieval accuracy, structuring responses, and evaluating results, helping you understand the tradeoffs of mixing and matching approaches for your specific needs. Ultimately, the power of AI applications doesn’t come from ineffable magic but from confident, experienced builders who understand these technologies and con- tinuously learn by doing. We’ve seen large language model–based applications evolveix
📄 Page 12
FOREWORDxrapidly over the past eight years, with much more to come. This book provides a solid foundation for building the future. —PACO NATHAN Senzing, Principal DevRel Engineer
📄 Page 13
preface This book came about because we (Oskar and Tomaž) had been working together for a few years at Neo4j and kept arriving at the same thought: someone should write a book about combining knowledge graphs with retrieval-augmented generation (RAG). We figured it might as well be us. The idea wasn’t born from some grand epiphany—it was just a practical realization. We’d both spent enough time with graphs, machine learn- ing, and generative AI to see that large language models (LLMs) had real limitations, like outdated info or missing domain-specific details. Knowledge graphs seemed like an obvious way to fix that, and it wasn’t that hard to put the two together. Our backgrounds made it a natural fit. Oskar, with over 20 years as a software engi- neer and a decade at Neo4j, leads the generative AI engineering team, focused on helping developers build GenAI apps with graphs. Tomaž has deep experience in graph algorithms, machine learning, and LLMs, contributing to frameworks like LangChain and LlamaIndex while writing about practical LLM applications. Together, we’d already been tinkering with these ideas—extracting structured data from text, plugging it into graphs, and using it to boost RAG. It worked well enough in our day-to-day that we thought others could use it too. The result is this book. It’s not here to overcomplicate things or sell you on some revolutionary breakthrough. We wrote it because we’ve seen GraphRAG solve prob- lems in a way that’s practical and doable, whether you’re new to this or already deep in the weeds. If you’re curious about making LLMs sharper with graphs, this is our take on how to get it done. Simple as that.xi
📄 Page 14
acknowledgments We’d like to thank everyone who helped make this book possible. To our colleagues at Neo4j: your insights, feedback, and shared passion for graphs and generative AI kept us on track and inspired us to dig deeper. A special nod goes to the engineering and research teams—your work laid the groundwork for many ideas in these pages. We’re grateful to the Manning team for guiding us through the process with patience and expertise. Their support turned our rough drafts into something worth reading. Special thanks go to Paco Nathan for writing the foreword to this book. Many thanks also go to technical editor Arturo Geigel for the invaluable help that he gave us. Arturo is an independent researcher from Puerto Rico who is recognized for being the inventor of Neural Trojans and currently carries out research machine learning, graph theory, and technological analysis. Thanks also go to the reviewers who took the time to read early versions and offer sharp, constructive notes that made this book better: Abhilash Babu, Adil Patel, Avinash Tiwari, Balbir Singh, George Robert Freeman, Giampiero Granatella, Gourav Sengupta, Harpal Singh, Igor Karp, Jared Duncan, Jayesh Kapadnis, Jeremy Chen, John Montgomery, Kanak Kshetri, Kasanicova Kristina, Laurens Meulman, Mehmet Yilmaz, Michael Bateman, Najeeb Arif, Peter V. Henstock, Praveen Gupta Sanka, Rani Sharim, Ravindra Jaju, Richard Meinsen, Ronald Borman, Saravanan Muniraj, Sergio Fernández Gonzalez, Shiroshica Kulatilake, Shyam Viswanathan, Sumit Pal, Tathagata Dasgupta, Varadharajan Pundi Sridhar, Wayne Mather, and Yilun Zhang. To our families (Oskar’s Johanna, Stella, Molly; Tomaz’s Anica, Blaz, Brina) and friends: Thank you for putting up with the late nights and endless shop talk. Yourxii
📄 Page 15
ACKNOWLEDGMENTS xiiiencouragement kept us going. Finally, a shoutout goes to the broader graph and GenAI community—your innovations and discussions pushed us to write something practical and useful. This book is as much a product of your collective energy as it is ours.
📄 Page 16
about this book Essential GraphRAG was written to guide readers in enhancing retrieval-augmented gen- eration (RAG) systems by integrating knowledge graphs with large language models (LLMs). The book aims to address the limitations of LLMs, such as outdated knowl- edge, hallucinations, and a lack of domain-specific data, by combining structured and unstructured data through practical methodologies and hands-on examples. The primary goal of Essential GraphRAG is to demonstrate how knowledge graphs can improve the accuracy, performance, and traceability of RAG systems in generative AI applications. The book explores grounding LLMs with both structured and unstructured data, offering a comprehensive guide to building a GraphRAG system from scratch. It combines years of expertise in graphs, machine learning, and applica- tion development to present stable architectural patterns in a rapidly evolving field. Readers will learn to implement GraphRAG without relying on existing frameworks, extract structured knowledge from text, and develop applications that blend vector- based and graph-based retrieval methods, including Microsoft’s GraphRAG approach. The book encourages active participation through its liveBook discussion forum to refine content and deepen collective understanding. Who should read this book This book is intended for data scientists, software engineers, and developers seeking to enhance their generative AI toolkit by incorporating knowledge graphs into RAG workflows. It is ideal for individuals with a basic understanding of Python, LLMs, and data processing concepts who are eager to address LLM limitations, like factualxiv
📄 Page 17
ABOUT THIS BOOK xvinaccuracies or knowledge cutoffs. The structured approach caters to a broad audience: junior practitioners will gain a solid foundation in GraphRAG techniques, while experi- enced professionals will find advanced strategies, like Microsoft’s GraphRAG implemen- tation, and fresh perspectives to elevate their work. Domain experts in fields like legal, literature, or business intelligence, where structured data and narrative summarization are critical, will also benefit from the practical examples and methodologies. How this book is organized: A road map The book is organized into eight chapters, some building on the previous to guide readers from foundational concepts to advanced GraphRAG implementations:  Chapter 1 introduces LLMs, their limitations (e.g., knowledge cutoff, hallucina- tions), and how RAG with knowledge graphs can overcome these issues using structured and unstructured data.  Chapter 2 covers embeddings, vector similarity search, and hybrid search tech- niques, providing a practical walkthrough of a RAG application, starting with unstructured data.  Chapter 3 delves into sophisticated retrieval methods to enhance RAG performance.  Chapter 4 teaches you how to convert natural language questions into Cypher queries for graph databases, enhancing retrieval flexibility.  Chapter 5 explores autonomous RAG systems that use LLMs and graphs for complex tasks.  Chapter 6 guides readers through extracting structured data from text (e.g., legal contracts) and building knowledge graphs, using tools like Neo4j.  Chapter 7 explores Microsoft’s GraphRAG pipeline using The Odyssey, focusing on entity/relationship extraction, community detection, and global/local search retrieval for summarization-heavy RAG applications.  Chapter 8 focuses on assessing the performance and reliability of GraphRAG systems. The book progresses from understanding LLM constraints and basic RAG to advanced graph-enhanced techniques, including Microsoft’s innovative summarization-focused approach, culminating in practical applications and evaluation About the code This book contains many examples of source code both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In some cases, even this was not enough, and listings include line-continuation markers (➥). Additionally, comments in the source code have often been removed
📄 Page 18
ABOUT THIS BOOKxvifrom the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts. Source code examples are available in the book’s accompanying GitHub reposi- tory, https://github.com/tomasonjo/kg-rag. The repository contains Jupyter note- books and Python scripts for each chapter, allowing readers to follow along with the book’s content. The code is organized by chapter, making it easy to find specific exam- ples and implementations. Additionally, the repository includes instructions for set- ting up the necessary environment and dependencies to run the code locally. You can get executable snippets of code from the liveBook (online) version of this book at https://livebook.manning.com/book/essential-graphrag. The complete code for the examples in the book is also available for download from the Manning website at https://www.manning.com/books/essential-graphrag. liveBook discussion forum Purchase of Essential GraphRAG includes free access to liveBook, Manning’s online reading platform. Using liveBook’s exclusive discussion features, you can attach com- ments to the book globally or to specific sections or paragraphs. It’s a snap to make notes for yourself, ask and answer technical questions, and receive help from the authors and other users. To access the forum, go to https://livebook.manning.com/ book/essential-graphrag/discussion. Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the authors can take place. It is not a commitment to any specific amount of participation on the part of the authors, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking them some challenging questions lest their interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.
📄 Page 19
about the authors TOMAŽ BRATANIČ has extensive experience with graphs, machine learning, and generative AI. He has written an in-depth book about using graph algorithms in practical examples. Nowadays, he focuses on generative AI and LLMs by contributing to popular frameworks like LangChain and LlamaIndex and writing blog posts about LLM-based applications. OSKAR HANE is a senior staff software engineer at Neo4j. He has over 20 years of experience as a software engineer and 10 years of experience working with Neo4j and knowledge graphs. He is currently leading the generative AI engineering team within Neo4j, with a focus on providing the best possible experience for other developers to build GenAI applications with Neo4j.xvii
📄 Page 20
about the cover illustration The figure on the cover of Essential GraphRAG is “Likanienne,” or “A woman from Lika,” taken from Balthasar Hacquet’s Illustrations de L’Illyrie et la Dalmatie. In those days, it was easy to identify where people lived and what their trade or sta- tion in life was just by their dress. Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional cul- ture centuries ago, brought back to life by pictures from collections such as this one.xviii
The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00
Total Amount (¥)
0
Donation Count

Login to support the author

Login Now
Back to List