Author:Ivan Reznikov
Feeling overwhelmed by the volume of data in your research? Sifting through massive amounts of data to find useful insights is becoming increasingly difficult in drug discovery, genetics, and healthcare. Enter the era of generative AI with LangChain, whose groundbreaking tools are changing the way life scientists and researchers operate. In this groundbreaking book, Dr. Ivan Reznikov teaches you to harness the power of AI to elevate your research capabilities.
Tags
Support Statistics
¥.00 ·
0times
Text Preview (First 20 pages)
Registered users can read the full content for free
Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.
Page
1
(This page has no text content)
Page
2
LangChain for Life Sciences and Healthcare Innovation Through LLMs and Generative AI Agents Ivan Reznikov
Page
3
LangChain for Life Sciences and Healthcare by Ivan Reznikov Copyright © 2025 Ivan Reznikov. All rights reserved. Published by O’Reilly Media, Inc., 141 Stony Circle, Suite 195 Santa Rosa, CA 95401. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800- 998-9938 or corporate@oreilly.com. Acquisitions Editor: Michelle Smith Development Editor: Corbin Collins Production Editor: Beth Kelly Copyeditor: Piper Content Partners Proofreader: Andrea Schein Indexer: Judith McConville Cover Designer: Karen Montgomery Cover Illustrator: Karen Montgomery Interior Designer: David Futato Interior Illustrator: Kate Dullea July 2025: First Edition
Page
4
Revision History for the First Edition 2025-07-16: First Release See http://oreilly.com/catalog/errata.csp? isbn=9781098162634 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. LangChain for Life Sciences and Healthcare, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-098-16263-4 [LSI]
Page
5
Dedication To my wife Nastassia, my muse no AI can replace.
Page
6
Preface One Chain to link them all One Chain to guide them, One Chain to serve them all And with AI insight them —Lord of the Chains (Hallucinations of the author after writing a book for more than a year. He understands a chain to be just a sequence of rings.) For centuries, life scientists have painstakingly unraveled the mysteries of nature, one experiment, one observation at a time. But what if we could accelerate discovery exponentially? What if we could empower researchers with intelligent tools capable of not only analyzing vast datasets but also generating novel hypotheses and solutions? The AI road map, according to several AI giants, may look as follows: Chatbots AI systems designed for conversational interactions, handling basic queries and tasks Reasoners Advanced AI capable of human-level problem-solving, analyzing complex data, and providing strategic solutions Agents
Page
7
Autonomous systems that can execute tasks, make decisions, and adapt to new situations without human intervention Innovators AI that contributes to invention and innovation, generating new ideas and optimizing designs Organizations Sophisticated AI entities capable of performing the work of an entire organization, managing operations, and making strategic decisions At the time of this book’s publication, we’re moving from reasoners to agents. This book will allow you to go even further and develop your own agents and multi-agent innovators, capable of creating new ideas and brainstorming. The book is separated into two parts. Part I is dedicated to setting the stage: Chapter 1 surveys the modern generative AI landscape. Chapter 2 discusses how large language models work. Chapter 3 teaches how to use LangChain components. Chapter 4 describes what hallucinations are, when to use them, and how to avoid them. Chapter 5 discovers ways in which LangChain applications can speed up general research with a debate machine and showcases simple LangGraph teams.
Page
8
Part II is dedicated to research, life science domains, and building commercial applications: Chapter 6 examines fine-tuned chemical models and teaches how to build multifunctional chemistry AI assistants. Chapter 7 develops research teams with an AlphaFold team member, DNA generation agents, and many more, followed by fine-tuning a DeepSeek reasoning model on biological data. Chapter 8 explores how to generate molecules with preset characteristics using variational autoencoders and looks into merging graph technologies with generative AI. Chapter 9 builds a powerful LangGraph team of different AI superagents, responsible for speech-to- text, retrieving table data, generating reports, and performing hypothesis reasoning. Chapter 10 concludes the book by discussing guardrails and best practices regarding data privacy, security, and compliance, looking into some alternatives such as LlamaIndex, CrewAI, and AutoGen, and creating live-time production generative AI applications with LangChain and its add-ons.
Page
9
WARNING The thesis of this book: if you read it end-to-end, you’ll understand that AI will not replace you—it will assist you. I compare it to Excel a quarter century ago: people similarly feared that the program would take the jobs of accountants, scientists, and analysts. Instead, it became an indispensable tool that empowered professionals to do more, faster, and better. AI is on a similar path. Hopefully, you’ve anticipated relief. Otherwise, look at the wonderful vitamin C molecule generated by ChatGPT. If you’re not convinced, consider looking for a new profession. Now that you feel safe—let’s begin. I came to the United States with my mother and little sister when my father got a job at my homeland’s diplomatic mission to the United Nations in New York. There are tales of people conquering America with nothing more than a pair of pants and a few bucks in their wallets. My story was similar, except I knew only a bit more than a dozen words: numbers one through ten, cat, dog, mom, dad, and perhaps a few more. I was just six years old. I vividly recall bringing home my first report card—a sea of Fs, peppered with some fortunate Cs and Ds. I probably could’ve tried explaining that such grades stood for
Page
10
Fabulous and Excellent, but, unfortunately, I didn’t know such words back then. It was a challenge to respond to test questions when the text seemed like an intercepted cryptogram. The only shining beacon was my performance in math. I got an A, scoring top of the class. I still remember the relief I felt when I encountered a test where the questions were 2 + 3 = ?. I yearned for a world where everything could be translated into numbers, enabling me to make friends and comprehend what my teacher was explaining. Ironically, years later, working on multiple natural language processing (NLP) projects, I found myself helping computers excel at precisely the same task I had struggled with in my youth. Machines operate solely in numbers— they don’t understand the set of weird-looking characters we call letters. Though I learned how to code in school, my passion led me toward chemistry and the silicon-first future of science. Throughout my PhD, I couldn’t help but notice the growing gap in terminology between the worlds of life sciences and software development. With two worlds speaking different languages, it was like witnessing a recurring déjà vu. With the evolution of language models, the feeling that there was a possible solution kept growing inside me. When ChatGPT appeared, and the world started discovering myriad uses for it, I felt optimistic. Unfortunately, finding hallucinations and hiccups prevented me from using it to create potential scientific applications right away. It was only when I bumped into LangChain that I caught the eureka feeling of pieces falling into place. In this book, we’ll learn how to develop and use applications that can automate scientific routines, help
Page
11
brainstorm, and speed up scientificand healthcare discoveries. We’ll build AI teams and unlock the potential of large language models and LangChain/LangGraph, etc., components in life sciences and healthcare. How to Read This Book Disclaimer: All code is Python, unless mentioned otherwise, typically executed in a code editor or an integrated development environment (IDE). In the very few first chapters, I’ve included all the imports. For space conservation purposes, unless needed, some of the imports are dropped. This isn’t a regular book you may be used to reading. Pre- AI, most outputs didn’t take much space and were quite deterministic. You’ll find more than 150 code snippet examples in the book, and for most cases, I’ve included both the code, the >execution step, the >>>execution results and the ---Start/End of AI-generated content--- splitter to showcase long contexts or AI’s thoughts if verbose=True (consider this as AI thinking out loud): ... <code> ... >execution step ----------Start of AI-generated content (optional if long context)---------- ----------End of AI-generated content (optional if long context)- ----------- >>>execution results The code with bonuses is also located in the official GitHub LangChain4LifeScience repo in the form of colab
Page
12
notebooks. This is done so you can read the book, learn about the raw model, and analyze the influence of code changes on AI responses from anywhere: the beach, metro, lab, or in front of your PC. This might be different from other books, but in this case, you can analyze the code we’ve written and the response of AI. I’ll keep maintaining the code and update it with the latest achievements. In this book, I’ve primarily used OpenAI’s endpoints due to their popularity and quality. However, you’ll also find examples using Anthropic, Gemini, DeepSeek, and other model providers. The cost of running the complete code provided in the book is comparable to a cup of coffee. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic
Page
13
Shows text that should be replaced with user-supplied values or by values determined by context. TIP This element signifies a tip or suggestion. NOTE This element signifies a general note. WARNING This element indicates a warning or caution. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/IvanReznikov/LangChain4LifeSciencesH ealthcare. If you have a technical question or a problem using the code examples, please send email to support@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission.
Page
14
Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “LangChain for Life Sciences and Healthcare by Ivan Reznikov (O’Reilly). Copyright 2025 Ivan Reznikov, 978-1-098-16263-4.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. O’Reilly Online Learning NOTE For more than 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in- depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com.
Page
15
How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-889-8969 (in the United States or Canada) 707-827-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://oreilly.com/about/contact.html We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/langchain-for-life-sciences. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly- media. Watch us on YouTube: https://youtube.com/oreillymedia.
Page
16
Acknowledgments I would like to express my sincere gratitude to Kerrie Holley, Christian Winkler, and Manish Mathur for their valuable insights, thoughtful reviews, and continuous encouragement throughout this work. A special thanks to the incredible team at O’Reilly for their support and guidance in bringing this project to life.
Page
17
Part I. Generative AI, Understanding Large Language Models, and LangChain
Page
18
Chapter 1. From Statistics to Generative AI in Life Sciences This chapter traces the current role of AI and how modern tools like LangChain help us build tomorrow’s applications —from molecular design to clinical analysis. Our journey will cover: The emergence of generative AI and its unique capabilities in life sciences Key differences between traditional AI and generative approaches Practical applications across biology, chemistry, and healthcare Essential considerations and limitations when building AI applications This foundation will prepare you for the hands-on chapters that follow, where we’ll implement these concepts using LangChain to solve real-world life sciences and medical problems. Understanding this context will help you decide when and how to apply different AI approaches in your work.
Page
19
Introduction In 1958, Herbert Simon and Allen Newell wrote, “In ten years, a digital computer will become the world chess champion” and “In ten years, a digital computer will discover and prove a new important mathematical theorem.” In 1965, Simon predicted that “in twenty years, machines will be capable of performing any work that a human can perform.” In 1970, Marvin Minsky stated in an interview with Life magazine: “In three to eight years, we will have a machine with the general intelligence of an average human.” Hans Moravec, in 1988, predicted that by 2010, robots would be able to perform most human tasks and significantly surpass human intelligence by 2040. In “The Age of Spiritual Machines” (1999), Ray Kurzweil predicted that by 2029, AI would pass a valid Turing test and achieve human levels of intelligence. As you might’ve guessed, not all the forecasts came true, though some came close. After yet another breakthrough, humanity becomes quite optimistic about the future. Believing in a bright tomorrow around the corner usually makes us think that what was once the realm of sci-fi is now reachable. For example, once ChatGPT was hyped, I started to notice news regarding not only GPT-4, an early release of which was available three months later, but also GPT-5, 6, 7… Though used earlier, the term data science started to become more widespread in the 1990s, indicating the potential power of using data. Its emergence was primarily due to the fact that parallel to computational power development, storage capacities also grew very quickly, allowing the digitization of a large amount of data. Such
Page
20
rapid evolution reflected the transition from manual data analysis and interpretation to the automated, intelligent systems that later defined the digital age. The role of data science is to create a scientific framework that works with such an exponentially increasing amount of data. Around the turn of the 21st century, the application of continuously evolving AI in life sciences expanded. In 2003, the Human Genome Project was completed, which provided unprecedented genetic data for AI systems to analyze, and this led to significant breakthroughs in understanding genetic diseases. The release of IBM’s Watson and its question-answering capabilities in 2007 showcased the potential for AI to assist in life sciences and healthcare fields with complex diagnostics and treatment planning. In 2016, Arterys became the first AI company to receive FDA clearance to use cloud-based deep learning in a clinical setting. Google’s DeepMind predicted protein structures in 2020, but at the time of writing, despite having over 150 small-molecule candidates in discovery phases and more than 15 in clinical trials, no drugs developed entirely through artificial intelligence have received approval from the FDA. Nevertheless, the drug discovery pipeline continues to expand rapidly. Among the most significant achievements are DSP-1181, the first AI-designed drug to enter clinical trials, developed through collaboration between Exscientia and Sumitomo Dainippon Pharma, and INS018-055, the first antifibrotic small molecule inhibitor to be both discovered and designed by AI, produced by Insilico Medicine. Though the development of DSP-1181 was accelerated dramatically— reaching clinical testing in just 12 months compared to the traditional 5-year timeline—it was ultimately discontinued in Phase I trials in July 2022 for failing to meet evaluation
The above is a preview of the first 20 pages. Register to read the complete e-book.
Comments 0
Loading comments...
Reply to Comment
Edit Comment