📄 Page
1
M A N N I N G Alessandro Negro Giuseppe Futia Vlastimil Kus Fabio Montagna Forewords by Maxime Labonne Khalifeh AlJadda
📄 Page
2
2 EPILOGUE Domain expert Emails/Chats Documents Unstructured sources Entities Relationships Question Query/ Retrieval Result Result/ Summary/ Report User Knowledge graph Data sources Structured data contains entities and relationships. They must be mapped to the target schema. Users can ask questions using natural language. The question is converted into a query that is executed on the KG or used to retrieve information from the KG via vector similarity. LLMs process text and recognize relevant entities and relationships. A mapping converts those into the schema. Results returned from the KG are further processed using LLMs to generate a more appropriate answer for the user (in terms of structure and content). Structured sources LLMs LLMs KGs Building from Structured and Unstructured Data, and LLMs Support for Querying and Retrieval
📄 Page
3
Knowledge Graphs and LLMs in Action
📄 Page
5
Knowledge Graphs and LLMs in Action ALESSANDRO NEGRO GIUSEPPE FUTIA VLASTIMIL KŮS FABIO MONTAGNA FOREWORDS BY MAXIME LABONNE AND KHALIFEH ALJADDA M A N N I N G SHELTER ISLAND
📄 Page
6
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2026 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. The author and publisher have made every effort to ensure that the information in this book was correct at press time. The author and publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause, or from any usage of the information herein. Manning Publications Co. Development editor: Dustin Archibald 20 Baldwin Road Technical editor: Dimitris Polychronopoulos PO Box 761 Review editor: Radmila Ercegovac Shelter Island, NY 11964 Production editor: Kathy Rossland Copy editor: Tiffany Taylor Development copy editor: Frances Buran Proofreader: Olga Milanko Technical proofreader: Sachin Panemangalore Typesetter and Cover designer: Marija Tudor ISBN 9781633439894 Printed in the United States of America
📄 Page
7
To Aurora, Filippo, and Flavia —Alessandro To my family—and especially my parents—for your unwavering love, support, and patience. To my friends and mentors, for walking with me, inspiring me, nudging me forward, and having the courage to be honest when it matters most. —Vlastimil To Debora—my constant since we collided on the cosmic graph of life—thank you for walking beside me through every node and edge of this journey. I'm deeply grateful to my parents, Marieta and Cosimo, for your unwavering support—and for only occasionally asking what on earth a knowledge graph is. And to my brother, Dante—your perfectly timed reality checks have kept me grounded (and reasonably sane). —Giuseppe To my wife Fiorella and my children Giulio, Azzurra, and Arianna, who patiently endured countless evenings of “I'm almost done writing.” —Fabio
📄 Page
8
brief contents PART 1 FOUNDATIONS OF HYBRID INTELLIGENT SYSTEMS ............. 1 1 ■ Knowledge graphs and LLMs: A killer combination 3 2 ■ Intelligent systems: A hybrid approach 17 PART 2 BUILDING KNOWLEDGE GRAPHS FROM STRUCTURED DATA SOURCES ............................................................. 37 3 ■ Create your first knowledge graph from ontologies 39 4 ■ From simple networks to multisource integration 65 PART 3 BUILDING KNOWLEDGE GRAPHS FROM TEXT ................... 95 5 ■ Extracting domain-specific knowledge from unstructured data 97 6 ■ Building knowledge graphs with large language models 115 7 ■ Named entity disambiguation 129 8 ■ NED with open LLMs and domain ontologies 180 PART 4 MACHINE LEARNING ON KNOWLEDGE GRAPHS .............. 207 9 ■ Machine learning on knowledge graphs: A primer approach 209 10 ■ Graph feature engineering: Manual and semiautomated approaches 233vi
📄 Page
9
BRIEF CONTENTS vii11 ■ Graph representation learning and graph neural networks 272 12 ■ Node classification and link prediction with GNNs 302 PART 5 INFORMATION RETRIEVAL WITH KNOWLEDGE GRAPHS AND LLMS ................................................... 335 13 ■ Knowledge graph–powered retrieval-augmented generation 337 14 ■ Asking a KG questions with natural language 356 15 ■ Building a QA agent with LangGraph 397 appendix A Introduction to graphs 435 appendix B Neo4j 447 appendix C Building knowledge graphs from structured sources 461 references 493 index 505
📄 Page
10
contents forewords xv preface xvii acknowledgments xix about this book xxi about the authors xxv about the cover illustration xxvii PART 1 FOUNDATIONS OF HYBRID INTELLIGENT SYSTEMS .......................................................... 1 1 Knowledge graphs and LLMs: A killer combination 3 1.1 Knowledge graphs 4 1.2 Large language models 6 1.3 KGs and LLMs: Stronger together 8 1.4 The paradigm shift in data-driven applications 10 The four pillars of knowledge graphs 11 1.5 Building data-driven applications using KGs and LLMs 12 Example use case: Drug discovery and development 13 ■ Example use case: Conversational AI for customer support 13 ■ Deciding whether to use a KG 14 1.6 Knowledge graph technologies 14 Taxonomies and ontologies 15viii
📄 Page
11
CONTENTS ix1.7 How do we teach KGs and LLMs? 16 2 Intelligent systems: A hybrid approach 17 2.1 What is intelligence? 18 2.2 Designing an intelligent system 19 What is an intelligent system? 20 ■ Categories of intelligent systems 20 ■ Characteristics of an intelligent system 23 2.3 Knowledge acquisition and representation 24 2.4 Reasoning 27 2.5 Reasoning engines 30 Limitations of a pure deductive reasoning engine 31 ■ Using inductive reasoning and ML 32 ■ The role of LLMs in the reasoning engine 33 2.6 A KG approach to IASs 33 PART 2 BUILDING KNOWLEDGE GRAPHS FROM STRUCTURED DATA SOURCES .......................... 37 3 Create your first knowledge graph from ontologies 39 3.1 Knowledge graph building: Warmup 41 Business and domain understanding 41 ■ Data understanding 43 3.2 Understanding knowledge graph technologies 46 RDF or LPG? A goal-driven discussion 47 ■ Representing edge properties with RDF and LPG 49 3.3 Building a knowledge graph 52 Ontology ingestion and processing with neosemantics 52 Annotation ingestion and processing 55 3.4 Querying the data 59 3.5 Reasoning over the KG 62 4 From simple networks to multisource integration 65 4.1 Biomedical knowledge graphs and applications 66 4.2 Multi-omic applications of KGs 67 Creating a KG from the PPI and protein-disease networks 69 High-level analysis of the resulting KGs 73 ■ Domain-specific analysis of the PPI and disease KG 76
📄 Page
12
CONTENTSx4.3 Pharmaceutical applications of KGs 80 Deep analysis of the Hetionet knowledge graph 84 ■ LLM-assisted interpretation of pathway analysis results 88 4.4 Clinical applications of KGs 90 LLM-guided clinical decision support analysis 93 PART 3 BUILDING KNOWLEDGE GRAPHS FROM TEXT ... 95 5 Extracting domain-specific knowledge from unstructured data 97 5.1 The archives challenge 98 5.2 Key concepts of knowledge extraction 99 Recognizing named entities 100 ■ Extracting relations 101 5.3 Building KGs with large language models 101 Using LLMs 102 ■ Prompt engineering examples 104 Prompt engineering guidelines 109 ■ KG building: Traditional NLP or LLMs? 112 6 Building knowledge graphs with large language models 115 6.1 Transforming an archive to a KG 116 Graph modeling 118 ■ Creating a metagraph 119 Normalization and cleansing 119 ■ Graph-based entity resolution 120 6.2 Intellectual network analysis: The value of graphs 122 6.3 Next steps in the Rockefeller Archive Center project 126 6.4 The value of knowledge graphs in the LLM era 127 7 Named entity disambiguation 129 7.1 From recognition to disambiguation 129 7.2 Understanding named entity disambiguation 132 7.3 Domain-based NED and LLMs 136 7.4 Business and domain understanding 138 Context 138 ■ Use case definition 140 7.5 Understanding the data 141 Unstructured data 141 ■ Domain ontologies 142 7.6 Building a SoHO knowledge graph 146 Defining the schema 147 ■ Processing and ingesting documents 148 ■ Disambiguating and ingesting medical
📄 Page
13
CONTENTS xientities 149 ■ Processing, loading, and mapping ontologies 152 Generating entity co-occurrences 157 7.7 KG-based use cases 158 Conceptual search 159 ■ Structured knowledge-based search 162 KG-based interpretability and discovery 166 ■ Uncovering new knowledge 174 8 NED with open LLMs and domain ontologies 180 8.1 Understanding limitations of traditional NED systems 180 8.2 Ingesting the domain ontology 182 8.3 Setting up the model with Ollama and Llama 3.1 8B 186 8.4 End-to-end NED process 187 Named entity recognition 188 ■ Candidate selection 192 Candidate disambiguation 194 8.5 Conclusions 205 PART 4 MACHINE LEARNING ON KNOWLEDGE GRAPHS ....................................................... 207 9 Machine learning on knowledge graphs: A primer approach 209 9.1 Machine learning on graphs: Why? 210 9.2 Machine learning on graphs: What? 211 Node classification 211 ■ Link prediction (a.k.a. relationship prediction) 214 ■ Clustering and community detection 216 Graph classification 217 9.3 Machine learning on graphs: How? 219 Node classification and link prediction 220 ■ Graph classification 228 ■ Graph clustering 229 10 Graph feature engineering: Manual and semiautomated approaches 233 10.1 Manual node features 235 Degree 237 ■ Triangles 239 ■ Density 241 ■ Geodesic (or shortest) path 242 ■ Closeness 244 ■ Betweenness 247 PageRank 249 ■ Prediction 250 10.2 Manual relationship features 254 Node-based representation 255 ■ Path-based features 256
📄 Page
14
CONTENTSxii10.3 Semiautomated feature extraction 263 Performing ReFeX manually 266 ■ Performing ReFeX automatically with code 268 11 Graph representation learning and graph neural networks 272 11.1 Embeddings in graph representation learning 273 Understanding graph embeddings: From discrete to continuous 274 Real-world applications and examples 278 11.2 The encoder–decoder model 279 The encoder: Converting graph structure to vectors 279 The decoder: Reconstructing graph properties 280 ■ The power of the framework 280 ■ Node2Vec: An example of an encoder–decoder framework 280 11.3 Shallow embeddings: A first approach to graph representation 283 Understanding shallow embeddings 283 ■ Limitations of shallow embeddings 284 11.4 Embeddings in knowledge graphs 285 Loss function 285 ■ Multirelationship decoder 288 11.5 Message passing and graph neural networks 289 The message-passing framework: A neural conversation 289 ■ Motivation and intuition: Why message passing works 290 ■ The basic GNN model 291 ■ Message passing with self-loops 291 11.6 Generalized aggregation and update methods 292 Neighborhood normalization 293 ■ Neighborhood attention 294 ■ Multihead attention and transformer connections 294 ■ Generalized update methods 297 11.7 The synergy of GNNs and LLMs 299 12 Node classification and link prediction with GNNs 302 12.1 Node classification for anti-money laundering applications 303 Input data 304 ■ Graph processor: Data preparation 305 Graph processor: Homogeneous PyG graph 307 Encoder–decoder architecture 310 ■ Evaluation and analysis 313
📄 Page
15
CONTENTS xiii12.2 Link prediction for movie recommendations 317 Input data 318 ■ Graph processor: Data preparation 319 Graph processor: Heterogeneous PyG graph 321 Encoder–decoder architecture 326 ■ Evaluation and analysis 330 PART 5 INFORMATION RETRIEVAL WITH KNOWLEDGE GRAPHS AND LLMS ...................................... 335 13 Knowledge graph–powered retrieval-augmented generation 337 13.1 AI agents 338 13.2 Chatting with the LLM 339 13.3 Challenges in the production environment 341 13.4 Chatting with the AI about private data 342 Retrieval-augmented generation 343 ■ Vector-based RAG limitations 345 ■ Graph RAG 347 ■ Reasoning agents 351 Let’s chat with our KG 352 14 Asking a KG questions with natural language 356 14.1 Querying a knowledge graph in the policing domain 357 Enabling domain experts with knowledge graphs 357 14.2 RAG for KG querying: Capabilities and challenges 358 RAG effectiveness with complete context 359 ■ RAG fragility with incomplete retrieval 361 14.3 Schema-based approach for querying KGs 363 Understanding and using graph schemas 364 14.4 Think like an expert: Using metadata for enhanced querying 366 14.5 Intent detection: Understanding user expectations 367 Classifying by visualization type 368 ■ Is it data, documentation, or just complaining? 372 14.6 From schema to LLM-ready context 376 Schema extraction and representation 377 ■ Enriching schemas with descriptive annotations 380 ■ A practical approach to schema representation 382
📄 Page
16
xiv14.7 It’s time to think: Understanding LLM reasoning 383 The order matters: Answer first vs. reasoning first 384 ■ Thinking in queries: From text to Cypher 386 ■ Structuring output for reliable query generation 391 14.8 Response summarization: From results to insights 392 15 Building a QA agent with LangGraph 397 15.1 Building the LangGraph pipeline 398 System architecture overview 399 ■ Configuring pipeline components 401 ■ Schema translation service 404 State management design 408 ■ Pipeline agent implementation 409 ■ Pipeline integration layer 415 15.2 Streamlit application 417 Application overview 418 ■ LangGraph integration 420 15.3 Expert-emulating investigation 422 Identifying the initial case 423 ■ Spatial analysis of surveillance coverage 425 ■ Vehicle pattern detection 427 ■ Context-aware request refinement 428 ■ Historical record analysis 430 15.4 Future directions and enhancements 432 Learning from use 432 ■ Enhancing core capabilities 433 Advanced evolution paths 433 appendix A Introduction to graphs 435 appendix B Neo4j 447 appendix C Building knowledge graphs from structured sources 461 references 493 index 505
📄 Page
17
forewords Working with graph neural networks and large language models over the years has taught me that each technology has profound strengths and equally profound limita- tions. Graph neural networks excel at understanding structured relationships but struggle with natural language interfaces. Large language models can engage in sophisticated conversations but frequently hallucinate facts and lack reliable ground- ing in structured knowledge. Knowledge Graphs and LLMs in Action tackles an important challenge in AI: how do we combine these technologies to build systems that are both intelligent and trust- worthy? Alessandro Negro, Giuseppe Futia, Vlastimil Kůs, and Fabio Montagna don't just theorize about this convergence; they provide practical recipes for making it work. Their approach bridges the gap between the precision of knowledge graphs and the accessibility of natural language, creating systems that can reason over complex data and explain their conclusions. What impressed me most about this work is its rare emphasis on real-world imple- mentation. The authors walk you through building knowledge graphs from messy, unstructured data and then show how to integrate them with language models for applications in healthcare, law enforcement, and beyond. The examples are concrete and the code is production-ready, making this both a learning resource and a practi- cal guide. The technical depth here is substantial, covering everything from graph construc- tion to advanced retrieval systems, but the authors never lose sight of the practical goal: building AI systems that can serve as reliable advisors in critical decisions. This hybrid approach addresses the reliability and explainability challenges that have lim- ited AI deployment in high-stakes environments.xv
📄 Page
18
FOREWORDSxvi If you're working on AI systems that need to be both powerful and trustworthy, Knowledge Graphs and LLMs in Action provides a clear framework for achieving it. The combination of knowledge graphs and language models represents a significant step toward AI that can handle complexity while maintaining the transparency and reli- ability that real-world applications demand. —MAXIME LABONNE HEAD OF POST-TRAINING, LIQUID AI As a data science leader and passionate advocate for knowledge graphs, I’m thrilled to recommend Knowledge Graphs and LLMs in Action. We are witnessing a transformative moment in AI, shaped by the rise of generative AI and large language models (LLMs). Systems like Gemini and ChatGPT have opened the doors to natural language inter- action at scale, offering a glimpse of intelligent machines. Yet we know these models are not without flaws. Hallucinations, outdated knowledge, limited transparency, and a lack of contextual grounding remain real challenges. Addressing concerns like these is where knowledge graphs (KGs) shine, not just as a complement to LLMs, but as a necessary foundation for building accurate, explain- able, and context-aware systems. This book demonstrates how the convergence of KGs and LLMs creates a powerful synergy, mitigating each other’s weaknesses while unlocking their full potential. The authors—Alessandro Negro, Vlastimil Kůs, Giuseppe Futia, and Fabio Mon- tagna—bring years of hands-on experience and consulting expertise. Their work moves beyond theory to deliver actionable, production-ready insights grounded in real-world applications. This book is more than a reference for knowledge graphs and LLMs. It’s a practi- cal toolkit for developing intelligent systems that enhance, not replace, human deci- sion-making across domains like healthcare, finance, and law enforcement. In an age where AI must be transparent, contextual, and trustworthy, this book is both timely and essential. It belongs on the shelf of every data scientist, engineer, architect, and knowledge-driven professional ready to build the next generation of intelligent systems. Thank you, Alessandro, Vlastimil, Giuseppe, and Fabio, for this insightful and practical book! —KHALIFEH ALJADDA DIRECTOR OF DATA SCIENCE, GOOGLE CUSTOMER SOLUTIONS
📄 Page
19
preface When I was nearing completion of my previous book, Graph-Powered Machine Learning, I reached out to my acquisitions editor, Mike Stephens, with a proposal for a natural continuation. That earlier work introduced knowledge graphs and demonstrated how they could be built using natural language processing, but many readers pointed out that graph neural networks were a significant missing piece. My proposed book would fill that gap while extending the knowledge graph story further, including detailed analysis and building techniques. Mike accepted the proposal, and I embarked on a new adventure with the working title Knowledge Graphs Applied. Recognizing the scope of the challenge, I invited three colleagues from GraphAware—Fabio, Giuseppe, and Vlastimil—to join the effort, confident that their combined expertise would be invaluable. I naively thought that if one author could write a book in four years, four authors could complete a book in just a year. That assumption proved as flawed as expecting nine women to deliver a baby in one month. Reality had other plans. Over the past years, significant changes swept through the technology landscape. Large language models (LLMs) and generative AI disrupted the field entirely, and knowledge graph practitioners suddenly found themselves with unprecedented opportunities to use this established technology in revolutionary ways. We initially planned to build on existing natural language processing (NLP) tools like BERT, but these were rapidly being superseded by LLM capabilities that opened new possibilities for building, querying, and analyzing knowledge graphs. This was precisely where many practitioners, ourselves included, were struggling. Rather than resist this transformation, we decided, together with Mike and Dustin Archibald (our development editor), to embrace it. We adjusted our title to Knowledge Graphs and LLMs in Action and substantially revised the content to position LLMs as anxvii
📄 Page
20
PREFACExviiiintegral component of our ultimate goal: intelligent advisor systems that empower humans in performing complex decision-making tasks. This pivot required extensive refactoring and a fundamental shift in our approach, but the result exceeded our expectations. The book you are reading has evolved into a manifesto for the power of hybrid sys- tems. It demonstrates how combining these technologies—knowledge graphs, which are well established, and LLMs, which are newly emerged—creates a flywheel effect that delivers remarkable long-term results. Knowledge graph practitioners will dis- cover how to use LLM capabilities for greater impact, and LLM practitioners will learn techniques that address some of the major limitations of language models. We invite you to join us on this journey toward more intelligent, more reliable, and more human-centered AI systems.