📄 Page
1
Ivan Reznikov LangChain for Life Sciences and Healthcare Innovation Through LLMs and Generative AI Agents
📄 Page
2
9 7 8 1 0 9 8 1 6 2 6 3 4 5 7 9 9 9 ISBN: 978-1-098-16263-4 US $79.99 CAN $99.99 DATA / DATA SCIENCE This groundbreaking book by Dr. Ivan Reznikov is specifically designed for scientists, researchers, and medical professionals who want to harness the power of generative AI to revolutionize and elevate their research capabilities. The first half of this two-part guide is essential for any specialist, covering the transition from traditional statistics to generative AI, the fundamentals of LLMs, AI agents, and the practical uses of LangChain and LangGraph. The second will appeal to life sciences professionals who want to create AI applications and multi-agent AI systems for chemistry, biology, drug development, healthcare, and more. • Learn how to create and integrate generative AI and LangChain applications into research • Discover how to substantially accelerate your experimental and data analysis • Explore cutting-edge AI solutions designed to address complex research problems • Gain the skills and knowledge to advance your career in AI-enhanced life sciences LangChain for Life Sciences and Healthcare “What sets this book apart is its incredible depth of practical examples. It contains hundreds of LangChain and LangGraph code samples specif ically tailored for life sciences and healthcare, giving you a comprehensive toolkit for every research challenge.” Harrison Chase, CEO and cofounder of LangChain “As someone who works technically with LLMs, I was surprised how ef fectively this technology can be applied in life sciences. Lots of use cases, lots of new insights!” Christian Winkler, professor, Nuremberg University of Applied Sciences “Expertly written and seamlessly connects AI fundamentals using LangChain to real-world life science applications. This book is essential reading for researchers and technologists ready to harness generative AI’s potential.” Kerrie Holley, former director of healthcare and life sciences industry solutions at Google Ivan Reznikov holds a PhD in chemistry and has expertise in computational modeling, data science, and generative AI. He’s amassed over a decade of experience seamlessly blending the realms of life sciences, healthcare, and data science. This unique background allowed him to pen this book, providing unparalleled insights and knowledge to unlock the potential of LangChain in scientific research and medical cases.
📄 Page
3
Ivan Reznikov LangChain for Life Sciences and Healthcare Innovation Through LLMs and Generative AI Agents
📄 Page
4
978-1-098-16263-4 [LSI] LangChain for Life Sciences and Healthcare by Ivan Reznikov Copyright © 2025 Ivan Reznikov. All rights reserved. Published by O’Reilly Media, Inc., 141 Stony Circle, Suite 195 Santa Rosa, CA 95401. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Michelle Smith Development Editor: Corbin Collins Production Editor: Beth Kelly Copyeditor: Piper Content Partners Proofreader: Andrea Schein Indexer: Judith McConville Cover Designer: Karen Montgomery Cover Illustrator: Karen Montgomery Interior Designer: David Futato Interior Illustrator: Kate Dullea July 2025: First Edition Revision History for the First Edition 2025-07-16: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098162634 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. LangChain for Life Sciences and Health‐ care, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
📄 Page
5
To my wife Nastassia, my muse no AI can replace.
📄 Page
6
(This page has no text content)
📄 Page
7
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Part I. Generative AI, Understanding Large Language Models, and LangChain 1. From Statistics to Generative AI in Life Sciences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Introduction 3 Application of Generative AI in Life Sciences 5 Audio and Visual 7 Text 8 Scientific Components 10 Research Studies 12 Drawbacks of Generative AI in Science 13 Summary 16 2. Introducing Large Language Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Embedding Models 23 Chat and Large Language Models 35 Tokens 35 Text and Sequence Generation 39 Decoding Strategies 48 All Sorts of Language Models 53 Large Language Model Limitations 57 Summary 59 v
📄 Page
8
3. Introducing LangChain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Indexes 62 Indexing 65 Vector Search 67 Vector Stores 68 Chains 71 The LangChain Expression Language 71 LangGraph 75 Prompts 78 Memory 87 Tools 91 Agents 95 Creating Apps with LangChain 98 Summary 109 4. Hallucinations and RAG Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Hallucinations, Their Causes, and Consequences 111 Hallucinations and Possible Solutions 116 Retrieval-Augmented Generation 118 Indexing and Data Preparation 122 Query Translation and Understanding 126 Routing to Correct Database/Index 131 Query Construction 134 Data Retrieval 140 Data Augmentation and Response Generation 141 RAG Variations: Self-RAG, Tree-RAG, CAG, Agentic RAG 143 Evaluating RAGs 145 The Advantages of Hallucinating 148 Summary 148 5. Building Personal Assistants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Building Assistants with Chains 151 Building Assistants with Agents 163 Building Assistants with Multiple Agents 172 Model Context Protocol 186 Summary 189 vi | Table of Contents
📄 Page
9
Part II. Building AI Agents and Assistants Using LangChain and LangGraph 6. LangChain for Chemistry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Generative AI in Chemistry 194 Text-Based 194 Code-Generative 195 Chemistry-Generative 196 Creating Applications with External Packages 198 ChemCrow and CACTUS 200 LLMs 204 LCEL Chains 208 Custom LangChain Agent 209 RDKit Custom Agents 211 Using Chemistry-Based LLMs 217 Using Text-Based LLMs in Chemical Applications 230 Summary 232 7. LangChain for Biology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 LLMs in Biology 235 Biological LangGraph Application 238 Creating Biological Tools 238 Fine-Tuning Large Language and Reasoning Models 251 Introduction to Large Reasoning Models 252 Summary 264 8. LangChain for Drug Discovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 In Silico Drug Discovery 265 Small Molecule Generation 268 Autoencoders 268 Knowledge Graphs 281 Neo4j Vectors 290 Summary 297 9. LangChain for Medicine and Healthcare. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Generative AI for Healthcare 301 Creating a Generative Healthcare Application 302 Brainstorming Assistant 303 Advanced Brainstorming Scenario 309 Integrating Speech-to-Text 317 RAG over SQL 320 Table of Contents | vii
📄 Page
10
Summarization 327 Report Generation 329 Multi-Team Applications 331 Adopting AI in Healthcare and Medicine 336 Summary 337 10. LangChain for Enterprise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Guardrails, Enterprise Best Practices, and Policies 340 Data Security, Privacy, and Compliance 341 Prompt Injection 347 Fallbacks 350 Off-Topic Questions 352 Preventing the Generation of Harmful Content and Toxicity 353 Evaluating LLMs and Generative AI Applications 357 LangChain and LangGraph Alternatives and Add-Ons 358 Data Integration and Retrieval Frameworks 358 Low-Code/No-Code Platforms 365 LLM Observability and Debugging Tools 368 Langfuse 371 AI Agent and Workflow Frameworks, and Specialized LLM Tools 373 Multi-Agent Frameworks 375 Summary 382 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 viii | Table of Contents
📄 Page
11
Preface One Chain to link them all One Chain to guide them, One Chain to serve them all And with AI insight them —Lord of the Chains (Hallucinations of the author after writing a book for more than a year. He understands a chain to be just a sequence of rings.) For centuries, life scientists have painstakingly unraveled the mysteries of nature, one experiment, one observation at a time. But what if we could accelerate discovery exponentially? What if we could empower researchers with intelligent tools capable of not only analyzing vast datasets but also generating novel hypotheses and solutions? The AI road map, according to several AI giants, may look as follows: Chatbots AI systems designed for conversational interactions, handling basic queries and tasks Reasoners Advanced AI capable of human-level problem-solving, analyzing complex data, and providing strategic solutions Agents Autonomous systems that can execute tasks, make decisions, and adapt to new situations without human intervention Innovators AI that contributes to invention and innovation, generating new ideas and opti‐ mizing designs ix
📄 Page
12
Organizations Sophisticated AI entities capable of performing the work of an entire organiza‐ tion, managing operations, and making strategic decisions At the time of this book’s publication, we’re moving from reasoners to agents. This book will allow you to go even further and develop your own agents and multi-agent innovators, capable of creating new ideas and brainstorming. The book is separated into two parts. Part I is dedicated to setting the stage: • Chapter 1 surveys the modern generative AI landscape. • Chapter 2 discusses how large language models work. • Chapter 3 teaches how to use LangChain components. • Chapter 4 describes what hallucinations are, when to use them, and how to avoid them. • Chapter 5 discovers ways in which LangChain applications can speed up general research with a debate machine and showcases simple LangGraph teams. Part II is dedicated to research, life science domains, and building commercial applications: • Chapter 6 examines fine-tuned chemical models and teaches how to build multi‐ functional chemistry AI assistants. • Chapter 7 develops research teams with an AlphaFold team member, DNA gen‐ eration agents, and many more, followed by fine-tuning a DeepSeek reasoning model on biological data. • Chapter 8 explores how to generate molecules with preset characteristics using variational autoencoders and looks into merging graph technologies with genera‐ tive AI. • Chapter 9 builds a powerful LangGraph team of different AI superagents, responsible for speech-to-text, retrieving table data, generating reports, and per‐ forming hypothesis reasoning. • Chapter 10 concludes the book by discussing guardrails and best practices regarding data privacy, security, and compliance, looking into some alternatives such as LlamaIndex, CrewAI, and AutoGen, and creating live-time production generative AI applications with LangChain and its add-ons. x | Preface
📄 Page
13
The thesis of this book: if you read it end-to-end, you’ll understand that AI will not replace you—it will assist you. I compare it to Excel a quarter century ago: people similarly feared that the program would take the jobs of accountants, scientists, and analysts. Instead, it became an indispensable tool that empowered professionals to do more, faster, and better. AI is on a similar path. Hopefully, you’ve anticipated relief. Otherwise, look at the wonder‐ ful vitamin C molecule generated by ChatGPT. If you’re not con‐ vinced, consider looking for a new profession. Now that you feel safe—let’s begin. I came to the United States with my mother and little sister when my father got a job at my homeland’s diplomatic mission to the United Nations in New York. There are tales of people conquering America with nothing more than a pair of pants and a few bucks in their wallets. My story was similar, except I knew only a bit more than a dozen words: numbers one through ten, cat, dog, mom, dad, and perhaps a few more. I was just six years old. I vividly recall bringing home my first report card—a sea of Fs, peppered with some fortunate Cs and Ds. I probably could’ve tried explaining that such grades stood for Fabulous and Excellent, but, unfortunately, I didn’t know such words back then. It was a challenge to respond to test questions when the text seemed like an intercepted cryptogram. The only shining beacon was my performance in math. I got an A, scor‐ ing top of the class. I still remember the relief I felt when I encountered a test where the questions were 2 + 3 = ?. I yearned for a world where everything could be translated into numbers, enabling me to make friends and comprehend what my teacher was explaining. Preface | xi
📄 Page
14
Ironically, years later, working on multiple natural language processing (NLP) projects, I found myself helping computers excel at precisely the same task I had struggled with in my youth. Machines operate solely in numbers—they don’t under‐ stand the set of weird-looking characters we call letters. Though I learned how to code in school, my passion led me toward chemistry and the silicon-first future of science. Throughout my PhD, I couldn’t help but notice the growing gap in terminology between the worlds of life sciences and software develop‐ ment. With two worlds speaking different languages, it was like witnessing a recur‐ ring déjà vu. With the evolution of language models, the feeling that there was a possible solution kept growing inside me. When ChatGPT appeared, and the world started discovering myriad uses for it, I felt optimistic. Unfortunately, finding hallucinations and hiccups prevented me from using it to create potential scientific applications right away. It was only when I bumped into LangChain that I caught the eureka feeling of pieces falling into place. In this book, we’ll learn how to develop and use applications that can automate scien‐ tific routines, help brainstorm, and speed up scientificand healthcare discoveries. We’ll build AI teams and unlock the potential of large language models and Lang‐ Chain/LangGraph, etc., components in life sciences and healthcare. How to Read This Book Disclaimer: All code is Python, unless mentioned otherwise, typically executed in a code editor or an integrated development environment (IDE). In the very few first chapters, I’ve included all the imports. For space conservation purposes, unless needed, some of the imports are dropped. This isn’t a regular book you may be used to reading. Pre-AI, most outputs didn’t take much space and were quite deterministic. You’ll find more than 150 code snippet examples in the book, and for most cases, I’ve included both the code, the >execu tion step, the >>>execution results and the ---Start/End of AI-generated content--- splitter to showcase long contexts or AI’s thoughts if verbose=True (con‐ sider this as AI thinking out loud): ... <code> ... >execution step ----------Start of AI-generated content (optional if long context)---------- ----------End of AI-generated content (optional if long context)------------ >>>execution results xii | Preface
📄 Page
15
The code with bonuses is also located in the official GitHub LangChain4LifeScience repo in the form of colab notebooks. This is done so you can read the book, learn about the raw model, and analyze the influence of code changes on AI responses from anywhere: the beach, metro, lab, or in front of your PC. This might be different from other books, but in this case, you can analyze the code we’ve written and the response of AI. I’ll keep maintaining the code and update it with the latest achievements. In this book, I’ve primarily used OpenAI’s endpoints due to their popularity and quality. However, you’ll also find examples using Anthropic, Gemini, DeepSeek, and other model providers. The cost of running the complete code provided in the book is comparable to a cup of coffee. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. This element signifies a general note. Preface | xiii
📄 Page
16
This element indicates a warning or caution. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/IvanReznikov/LangChain4LifeSciencesHealthcare. If you have a technical question or a problem using the code examples, please send email to support@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “LangChain for Life Sci‐ ences and Healthcare by Ivan Reznikov (O’Reilly). Copyright 2025 Ivan Reznikov, 978-1-098-16263-4.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. xiv | Preface
📄 Page
17
How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-889-8969 (in the United States or Canada) 707-827-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://oreilly.com/about/contact.html We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/langchain-for-life-sciences. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media. Watch us on YouTube: https://youtube.com/oreillymedia. Acknowledgments I would like to express my sincere gratitude to Kerrie Holley, Christian Winkler, and Manish Mathur for their valuable insights, thoughtful reviews, and continuous encouragement throughout this work. A special thanks to the incredible team at O’Reilly for their support and guidance in bringing this project to life. Preface | xv
📄 Page
18
(This page has no text content)
📄 Page
19
PART I Generative AI, Understanding Large Language Models, and LangChain
📄 Page
20
(This page has no text content)