Mayo Oshin & Nuno Campos Learning LangChain Building AI and LLM Applications with LangChain and LangGraph
9 7 8 1 0 9 8 1 6 7 2 8 8 5 7 9 9 9 ISBN: 978-1-098-16728-8 US $79.99 CAN $99.99 DATA If you’re looking to build production-ready AI applications that can reason and retrieve external data for context-awareness, you’ll need to master LangChain—a popular development framework and platform for building, running, and managing agentic applications. LangChain is used by several leading companies, including Zapier, Replit, Databricks, and many more. This guide is an indispensable resource for developers who understand Python or JavaScript but are beginners eager to harness the power of AI. Authors Mayo Oshin and Nuno Campos demystify the use of LangChain through practical insights and in-depth tutorials. Starting with basic concepts, this book shows you step-by-step how to build a production-ready AI agent that uses your data. • Harness the power of retrieval-augmented generation (RAG) to enhance the accuracy of LLMs using external up-to-date data • Develop and deploy AI applications that interact intelligently and contextually with users • Make use of the powerful agent architecture with LangGraph • Integrate and manage third-party APIs and tools to extend the functionality of your AI applications • Monitor, test, and evaluate your AI applications to improve performance • Understand the foundations of LLM app development and how they can be used with LangChain Learning LangChain “ With clear explanations and actionable techniques, this is the go-to resource for anyone looking to harness LangChain’s power for production-ready generative AI and agents. A must-read for developers aiming to push the boundaries of this platform.” Tom Taulli, IT consultant and author of AI-Assisted Programming “This comprehensive guide covers everything from document retrieval and indexing to deploying and monitoring AI agents in production. With engaging examples, intuitive illustrations, and hands-on code, this book made learning LangChain interesting and fun!” Rajat K. Goel, senior software engineer, IBM Mayo Oshin is a tech entrepreneur, AI advisor, and angel investor. He was an early developer contributor and advocate for the LangChain open source library and a pioneer in the popular AI “chat with data” movement. Nuno Campos is a founding software engineer at LangChain. He has a decade of experience as a Python and JavaScript software engineer, architect, and open source maintainer.
Praise for Learning LangChain With clear explanations and actionable techniques, this is the go-to resource for anyone looking to harness LangChain’s power for production-ready generative AI and agents. A must-read for developers aiming to push the boundaries of this platform. —Tom Taulli, IT consultant and author of AI-Assisted Programming This comprehensive guide on LangChain covers everything from document retrieval and indexing to deploying and monitoring AI agents in production. With engaging examples, intuitive illustrations, and hands-on code, this book made learning LangChain interesting and fun! —Rajat K. Goel, senior software engineer, IBM This book is a comprehensive LLM guide covering fundamentals to production, packed with technical insights, practical strategies, and robust AI patterns. —Gourav Singh Bais, senior data scientist and senior technical content writer, Allianz Services Prototyping generative AI apps is easy—shipping them is hard. The strategies and tools in Learning LangChain make it possible to turn ideas into modern, production-ready applications. —James Spiteri, director of product management for security, Elastic Learning LangChain provides a clear path for transforming how you build AI-powered applications. By breaking down flexible architectures and robust checkpointing, it offers a strong foundation for creating reliable, production-ready AI agents at scale. —David O’Regan, engineering manager for AI/ML, GitLab
Learning LangChain helped us skip the boilerplate for debugging and monitoring. The many helpful patterns and tooling insights allowed us to move fast and deploy AI apps with confidence. —Chris Focke, chief AI scientist, AppFolio Teaching LangChain through clear, actionable examples, this book is a gateway to agentic applications that are as inspiring as Asimov’s sci-fi novels. — Ilya Meyzin, SVP head of data science, Dun & Bradstreet
Mayo Oshin and Nuno Campos Learning LangChain Building AI and LLM Applications with LangChain and LangGraph
978-1-098-16728-8 [LSI] Learning LangChain by Mayo Oshin and Nuno Campos Copyright © 2025 Olumayowa “Mayo” Olufemi Oshin and O’Reilly Media, Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Nicole Butterfield Development Editor: Corbin Collins Production Editor: Clare Laylock Copyeditor: nSight, Inc. Proofreader: Helena Stirling Indexer: Judith McConville Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea February 2025: First Edition Revision History for the First Edition 2025-02-13: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098167288 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Learning LangChain, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1. LLM Fundamentals with LangChain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Getting Set Up with LangChain 3 Using LLMs in LangChain 4 Making LLM Prompts Reusable 7 Getting Specific Formats out of LLMs 13 JSON Output 14 Other Machine-Readable Formats with Output Parsers 15 Assembling the Many Pieces of an LLM Application 16 Using the Runnable Interface 16 Imperative Composition 18 Declarative Composition 20 Summary 22 2. RAG Part I: Indexing Your Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 The Goal: Picking Relevant Context for LLMs 24 Embeddings: Converting Text to Numbers 25 Embeddings Before LLMs 25 LLM-Based Embeddings 27 Semantic Embeddings Explained 27 Converting Your Documents into Text 30 Splitting Your Text into Chunks 32 Generating Text Embeddings 36 Storing Embeddings in a Vector Store 39 Getting Set Up with PGVector 40 Working with Vector Stores 41 Tracking Changes to Your Documents 44 Indexing Optimization 48 v
MultiVectorRetriever 49 RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval 53 ColBERT: Optimizing Embeddings 54 Summary 56 3. RAG Part II: Chatting with Your Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Introducing Retrieval-Augmented Generation 57 Retrieving Relevant Documents 59 Generating LLM Predictions Using Relevant Documents 63 Query Transformation 68 Rewrite-Retrieve-Read 68 Multi-Query Retrieval 71 RAG-Fusion 74 Hypothetical Document Embeddings 78 Query Routing 81 Logical Routing 81 Semantic Routing 84 Query Construction 87 Text-to-Metadata Filter 87 Text-to-SQL 90 Summary 92 4. Using LangGraph to Add Memory to Your Chatbot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Building a Chatbot Memory System 96 Introducing LangGraph 98 Creating a StateGraph 101 Adding Memory to StateGraph 105 Modifying Chat History 107 Trimming Messages 107 Filtering Messages 110 Merging Consecutive Messages 112 Summary 114 5. Cognitive Architectures with LangGraph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Architecture #1: LLM Call 118 Architecture #2: Chain 121 Architecture #3: Router 125 Summary 133 6. Agent Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 The Plan-Do Loop 136 Building a LangGraph Agent 139 Always Calling a Tool First 143 vi | Table of Contents
Dealing with Many Tools 148 Summary 153 7. Agents II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Reflection 155 Subgraphs in LangGraph 161 Calling a Subgraph Directly 162 Calling a Subgraph with a Function 164 Multi-Agent Architectures 165 Supervisor Architecture 167 Summary 170 8. Patterns to Make the Most of LLMs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Structured Output 173 Intermediate Output 176 Streaming LLM Output Token-by-Token 178 Human-in-the-Loop Modalities 179 Multitasking LLMs 186 Summary 189 9. Deployment: Launching Your AI Application into Production. . . . . . . . . . . . . . . . . . . . . 191 Prerequisites 191 Install Dependencies 192 Large Language Model 192 Vector Store 193 Backend API 197 Create a LangSmith Account 199 Understanding the LangGraph Platform API 200 Data Models 201 Features 202 Deploying Your AI Application on LangGraph Platform 204 Create a LangGraph API Config 204 Test Your LangGraph App Locally 205 Deploy from the LangSmith UI 207 Launch LangGraph Studio 210 Security 213 Summary 214 10. Testing: Evaluation, Monitoring, and Continuous Improvement. . . . . . . . . . . . . . . . . . 215 Testing Techniques Across the LLM App Development Cycle 216 The Design Stage: Self-Corrective RAG 217 The Preproduction Stage 224 Creating Datasets 224 Table of Contents | vii
Defining Your Evaluation Criteria 228 Regression Testing 234 Evaluating an Agent’s End-to-End Performance 235 Production 247 Tracing 247 Collect Feedback in Production 248 Classification and Tagging 249 Monitoring and Fixing Errors 249 Summary 250 11. Building with LLMs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Interactive Chatbots 252 Collaborative Editing with LLMs 254 Ambient Computing 255 Summary 257 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 viii | Table of Contents
Preface On November 30, 2022, San Francisco–based firm OpenAI publicly released ChatGPT—the viral AI chatbot that can generate content, answer questions, and solve problems like a human. Within two months of its launch, ChatGPT attracted over 100 million monthly active users, the fastest adoption rate of a new consumer technology application (so far). ChatGPT is a chatbot experience powered by an instruction and dialogue-tuned version of OpenAI’s GPT-3.5 family of large language models (LLMs). We’ll get to definitions of these concepts very shortly. Building LLM applications with or without LangChain requires the use of an LLM. In this book we will be making use of the OpenAI API as the LLM provider we use in the code examples (pricing is listed on its platform). One of the benefits of working with LangChain is that you can follow along with all of these examples using either OpenAI or alternative commercial or open source LLM providers. Three months later, OpenAI released the ChatGPT API, giving developers access to the chat and speech-to-text capabilities. This kickstarted an uncountable number of new applications and technical developments under the loose umbrella term of generative AI. Before we define generative AI and LLMs, let’s touch on the concept of machine learn‐ ing (ML). Some computer algorithms (imagine a repeatable recipe for achievement of some predefined task, such as sorting a deck of cards) are directly written by a software engineer. Other computer algorithms are instead learned from vast amounts of training examples—the job of the software engineer shifts from writing the algorithm itself to writing the training logic that creates the algorithm. A lot of attention in the ML field went into developing algorithms for predicting any number of things, from tomorrow’s weather to the most efficient delivery route for an Amazon driver. ix
With the advent of LLMs and other generative models (such as diffusion models for generating images, which we don’t cover in this book), those same ML techniques are now applied to the problem of generating new content, such as a new paragraph of text or drawing, that is at the same time unique and informed by examples in the training data. LLMs in particular are generative models dedicated to generating text. LLMs have two other differences from previous ML algorithms: • They are trained on much larger amounts of data; training one of these models from scratch would be very costly. • They are more versatile. The same text generation model can be used for summarization, translation, classifi‐ cation, and so forth, whereas previous ML models were usually trained and used for a specific task. These two differences conspire to make the job of the software engineer shift once more, with increasing amounts of time dedicated to working out how to get an LLM to work for their use case. And that’s what LangChain is all about. By the end of 2023, competing LLMs emerged, including Anthropic’s Claude and Google’s Bard (later renamed Gemini), providing even wider access to these new capabilities. And subsequently, thousands of successful startups and major enterprises have incorporated generative AI APIs to build applications for various use cases, ranging from customer support chatbots to writing and debugging code. On October 22, 2022, Harrison Chase published the first commit on GitHub for the LangChain open source library. LangChain started from the realization that the most interesting LLM applications needed to use LLMs together with “other sources of computation or knowledge”. For instance, you can try to get an LLM to generate the answer to this question: How many balls are left after splitting 1,234 balls evenly among 123 people? You’ll likely be disappointed by its math prowess. However, if you pair it up with a calculator function, you can instead instruct the LLM to reword the question into an input that a calculator could handle: 1,234 % 123 Then you can pass that to a calculator function and get an accurate answer to your original question. LangChain was the first (and, at the time of writing, the largest) library to provide such building blocks and the tooling to reliably combine them into larger applications. Before discussing what it takes to build compelling applications with these new tools, let’s get more familiar with LLMs and LangChain. x | Preface
1 Tom B. Brown et al., “Language Models Are Few-Shot Learners”, arXiv, July 22, 2020. 2 Xiang Zhang et al., “Don’t Trust ChatGPT When Your Question Is Not in English: A Study of Multilingual Abilities and Types of LLMs”, Proceedings of the 2023 Conference on Empirical Methods in Natural Lan‐ guage Processing, December 6–10, 2023. Brief Primer on LLMs In layman’s terms, LLMs are trained algorithms that receive text input and predict and generate humanlike text output. Essentially, they behave like the familiar auto‐ complete feature found on many smartphones, but taken to an extreme. Let’s break down the term large language model: • Large refers to the size of these models in terms of training data and parameters used during the learning process. For example, OpenAI’s GPT-3 model contains 175 billion parameters, which were learned from training on 45 terabytes of text data.1 Parameters in a neural network model are made up of the numbers that control the output of each neuron and the relative weight of its connections with its neighboring neurons. (Exactly which neurons are connected to which other neurons varies for each neural network architecture and is beyond the scope of this book.) • Language model refers to a computer algorithm trained to receive written text (in English or other languages) and produce output also as written text (in the same language or a different one). These are neural networks, a type of ML model which resembles a stylized conception of the human brain, with the final output resulting from the combination of the individual outputs of many simple mathematical functions, called neurons, and their interconnections. If many of these neurons are organized in specific ways, with the right training process and the right training data, this produces a model that is capable of interpreting the meaning of individual words and sentences, which makes it possible to use them for generating plausible, readable, written text. Because of the prevalence of English in the training data, most models are better at English than they are at other languages with a smaller number of speakers. By “better” we mean it is easier to get them to produce desired outputs in English. There are LLMs designed for multilingual output, such as BLOOM, that use a larger pro‐ portion of training data in other languages. Curiously, the difference in performance between languages isn’t as large as might be expected, even in LLMs trained on a predominantly English training corpus. Researchers have found that LLMs are able to transfer some of their semantic understanding to other languages.2 Preface | xi
Put together, large language models are instances of big, general-purpose language models that are trained on vast amounts of text. In other words, these models have learned from patterns in large datasets of text—books, articles, forums, and other publicly available sources—to perform general text-related tasks. These tasks include text generation, summarization, translation, classification, and more. Let’s say we instruct an LLM to complete the following sentence: The capital of England is _______. The LLM will take that input text and predict the correct output answer as London. This looks like magic, but it’s not. Under the hood, the LLM estimates the probability of a sequence of word(s) given a previous sequence of words. Technically speaking, the model makes predictions based on tokens, not words. A token represents an atomic unit of text. Tokens can represent individual characters, words, subwords, or even larger linguistic units, depending on the specific tokenization approach used. For example, using GPT-3.5’s tokenizer (called cl100k), the phrase good morning dearest friend would consist of five tokens (using _ to show the space character): Good With token ID 19045 _morning With token ID 6693 _de With token ID 409 arest With token ID 15795 _friend With token ID 4333 Usually tokenizers are trained with the objective of having the most common words encoded into a single token, for example, the word morning is encoded as the token 6693. Less common words, or words in other languages (usually tokenizers are trained on English text), require several tokens to encode them. For example, the word dearest is encoded as tokens 409, 15795. One token spans on average four characters of text for common English text, or roughly three quarters of a word. xii | Preface
3 For more information, see Ashish Vaswani et al., “Attention Is All You Need ", arXiv, June 12, 2017. The driving engine behind LLMs’ predictive power is known as the transformer neural network architecture.3 The transformer architecture enables models to handle sequences of data, such as sentences or lines of code, and make predictions about the likeliest next word(s) in the sequence. Transformers are designed to understand the context of each word in a sentence by considering it in relation to every other word. This allows the model to build a comprehensive understanding of the meaning of a sentence, paragraph, and so on (in other words, a sequence of words) as the joint meaning of its parts in relation to each other. So, when the model sees the sequence of words the capital of England is, it makes a prediction based on similar examples it saw during its training. In the model’s training corpus the word England (or the token(s) that represent it) would have often shown up in sentences in similar places to words like France, United States, China. The word capital would figure in the training data in many sentences also containing words like England, France, and US, and words like London, Paris, Washington. This repetition during the model’s training resulted in the capacity to correctly predict that the next word in the sequence should be London. The instructions and input text you provide to the model is called a prompt. Prompt‐ ing can have a significant impact on the quality of output from the LLM. There are several best practices for prompt design or prompt engineering, including providing clear and concise instructions with contextual examples, which we discuss later in this book. Before we go further into prompting, let’s look at some different types of LLMs available for you to use. The base type, from which all the others derive, is commonly known as a pretrained LLM: it has been trained on very large amounts of text (found on the internet and in books, newspapers, code, video transcripts, and so forth) in a self-supervised fashion. This means that—unlike in supervised ML, where prior to training the researcher needs to assemble a dataset of pairs of input to expected output—for LLMs those pairs are inferred from the training data. In fact, the only feasible way to use datasets that are so large is to assemble those pairs from the training data automatically. Two techniques to do this involve having the model do the following: Predict the next word Remove the last word from each sentence in the training data, and that yields a pair of input and expected output, such as The capital of England is ___ and London. Preface | xiii
Predict a missing word Similarly, if you take each sentence and omit a word from the middle, you now have other pairs of input and expected output, such as The ___ of England is London and capital. These models are quite difficult to use as is, they require you to prime the response with a suitable prefix. For instance, if you want to know the capital of England, you might get a response by prompting the model with The capital of England is, but not with the more natural What is the capital of England? Instruction-Tuned LLMs Researchers have made pretrained LLMs easier to use by further training (additional training applied on top of the long and costly training described in the previous section), also known as fine-tuning them on the following: Task-specific datasets These are datasets of pairs of questions/answers manually assembled by research‐ ers, providing examples of desirable responses to common questions that end users might prompt the model with. For example, the dataset might contain the following pair: Q: What is the capital of England? A: The capital of England is London. Unlike the pretraining datasets, these are manually assembled, so they are by necessity much smaller: Reinforcement learning from human feedback (RLHF) Through the use of RLHF methods, those manually assembled datasets are aug‐ mented with user feedback received on output produced by the model. For example, user A preferred The capital of England is London to London is the capital of England as an answer to the earlier question. Instruction-tuning has been key to broadening the number of people who can build applications with LLMs, as they can now be prompted with instructions, often in the form of questions such as, What is the capital of England?, as opposed to The capital of England is. Dialogue-Tuned LLMs Models tailored for dialogue or chat purposes are a further enhancement of instruction-tuned LLMs. Different providers of LLMs use different techniques, so this is not necessarily true of all chat models, but usually this is done via the following: Dialogue datasets The manually assembled fine-tuning datasets are extended to include more exam‐ ples of multiturn dialogue interactions, that is, sequences of prompt-reply pairs. xiv | Preface
Chat format The input and output formats of the model are given a layer of structure over freeform text, which divides text into parts associated with a role (and option‐ ally other metadata like a name). Usually, the roles available are system (for instructions and framing of the task), user (the actual task or question), and assistant (for the outputs of the model). This method evolved from early prompt engineering techniques and makes it easier to tailor the model’s output while making it harder for models to confuse user input with instructions. Confusing user input with prior instructions is also known as jailbreaking, which can, for instance, lead to carefully crafted prompts, possibly including trade secrets, being exposed to end users. Fine-Tuned LLMs Fine-tuned LLMs are created by taking base LLMs and further training them on a proprietary dataset for a specific task. Technically, instruction-tuned and dialogue- tuned LLMs are fine-tuned LLMs, but the term “fine-tuned LLM” is usually used to describe LLMs that are tuned by the developer for their specific task. For example, a model can be fine-tuned to accurately extract the sentiment, risk factors, and key financial figures from a public company’s annual report. Usually, fine-tuned models have improved performance on the chosen task at the expense of a loss of generality. That is, they become less capable of answering queries on unrelated tasks. Throughout the rest of this book, when we use the term LLM, we mean instruction- tuned LLMs, and for chat model we mean dialogue-instructed LLMs, as defined earlier in this section. These should be your workhorses when using LLMs—the first tools you reach for when starting a new LLM application. Now let’s quickly discuss some common LLM prompting techniques before diving into LangChain. Brief Primer on Prompting As we touched on earlier, the main task of the software engineer working with LLMs is not to train an LLM, or even to fine-tune one (usually), but rather to take an existing LLM and work out how to get it to accomplish the task you need for your application. There are commercial providers of LLMs, like OpenAI, Anthropic, and Google, as well as open source LLMs (Llama, Gemma, and others), released free-of-charge for others to build upon. Adapting an existing LLM for your task is called prompt engineering. Many prompting techniques have been developed in the past two years, and in a broad sense, this is a book about how to do prompt engineering with LangChain— how to use LangChain to get LLMs to do what you have in mind. But before we get Preface | xv
into LangChain proper, it helps to go over some of these techniques first (and we apologize in advance if your favorite prompting technique isn’t listed here; there are too many to cover). To follow along with this section we recommend copying these prompts to the OpenAI Playground to try them yourself: 1. Create an account for the OpenAI API at http://platform.openai.com, which will let you use OpenAI LLMs programmatically, that is, using the API from your Python or JavaScript code. It will also give you access to the OpenAI Playground, where you can experiment with prompts from your web browser. 2. If necessary, add payment details for your new OpenAI account. OpenAI is a commercial provider of LLMs and charges a fee for each time you use their models through OpenAI’s API or through Playground. You can find the latest pricing on their website. Over the past two years, the price for using OpenAI’s models has come down significantly as new capabilities and optimizations are introduced. 3. Head on over to the OpenAI Playground and you’re ready to try out the follow‐ ing prompts for yourself. We’ll make use of the OpenAI API throughout this book. 4. Once you’ve navigated to the Playground, you will see a panel of presets on the right side of the screen, including your model of choice. If you look further down the panel, you will see Temperature under the “Model configuration” title. Move the Temperature toggle from middle to left until the number shows 0.00. Essentially, temperature controls the randomness of LLM output. The lower the temperature, the more deterministic the model output. Now on to the prompts! Zero-Shot Prompting The first and most straightforward prompting technique consists of simply instruct‐ ing the LLM to perform the desired task: How old was the 30th president of the United States when his wife's mother died? This is typically what you should try first, and it will usually work for simple ques‐ tions, especially when the answer to it was likely present in some of the training data. If we prompt OpenAI’s gpt-3.5-turbo with the preceding prompt, the following is returned: The 30th president of the United States, Calvin Coolidge, was 48 years old when his wife's mother passed away in 1926. xvi | Preface
4 Ryan Liu et al. “Mind Your Step (by Step): Chain-of-Thought Can Reduce Performance on Tasks Where Thinking Makes Humans Worse”, arXiv, November 8, 2024. You may get a different result from what we get. There is an element of randomness to how LLMs generate responses, and OpenAI may have updated the model by the time you try it. While the model did correctly identify the 30th president, the answer isn’t quite right. Often you’ll have to iterate on prompts and responses to get a reliable system. The next sections expand on how. Chain-of-Thought A very useful iteration is to further instruct the model to take the time to think. This technique has been found to increase performance on a variety of tasks. Curiously, a more recent paper4 suggests that this technique can also reduce LLM performance on tasks where thinking reduces human performance. Called chain-of-thought (CoT) prompting, this is usually done by prepending the prompt with instructions for the LLM to describe how it could arrive at the answer: Think step by step. How old was the 30th president of the United States when his wife's mother died? And now let’s see what the same model produces for this new prompt: 1. First, find out who the 30th president of the United States was. This can be done by researching a list of US presidents or by searching for "30th president of the United States" online. You will find that the 30th president was Calvin Coolidge. 2. Next, find out the year when Calvin Coolidge was the president. This can also be found by researching online or in a history book. You will find that he was the president from 1923-1929. 3. Now, you need to determine the year when Calvin Coolidge's wife's mother died. This can be done by researching the life and death of his wife, Grace Goodhue Coolidge. You will find that she lived from 1879-1957. 4. Calculate the age of Calvin Coolidge's wife's mother when she died. She was 78 years old. 5. Determine the age of Calvin Coolidge when his wife's mother died. Subtract the year of his birth, 1872, from the year his mother-in-law died, 1957. This means that Calvin Coolidge was 85 years old when his wife's mother passed away. Preface | xvii
Notice how the only change in the prompt was the addition of Think step by step, and how much of a difference that made in the model’s output. Now the model outputs a sequence of steps you could go through when trying to answer this question. Unfortunately, while the steps are very reasonable, the final output is still incorrect, given it got some facts wrong, namely it listed the wrong birth and death years for President Coolidge’s wife’s mother. Let’s see how to improve on this. Retrieval-Augmented Generation Retrieval-augmented generation (RAG) consists of finding relevant pieces of text, also known as context, such as facts you’d find in an encyclopedia and including that context in the prompt. The RAG technique can (and in real applications should) be combined with CoT, but for simplicity we’ll use these techniques one at a time here. Here’s the prompt including RAG: Context: - Calvin Coolidge (born John Calvin Coolidge Jr.; /ˈkuːlɪdʒ/; July 4, 1872 – January 5, 1933) was an American attorney and politician who served as the 30th president of the United States from 1923 to 1929. - Grace Anna Coolidge (née Goodhue; January 3, 1879 – July 8, 1957) was the wife of the 30th president of the United States, Calvin Coolidge. - Grace Anna Goodhue was born on January 3, 1879, in Burlington, Vermont, the only child of Andrew Issachar Goodhue and Lemira Barrett Goodhue. - Lemira A. Goodhue (Barrett) ; Birthdate: April 26, 1849 ; Birthplace: Burlington, Chittenden County, VT, United States ; Death: October 24, 1929. How old was the 30th president of the United States when his wife's mother died? And the output from the model: The 30th president of the United States, Calvin Coolidge, was 54 years old when his wife's mother, Lemira A. Goodhue, died on October 24, 1929. Now we’re a lot closer to the correct answer, but as we touched on earlier, LLMs aren’t great at out-of-the-box math. In this case, the final result of 54 years old is off by 3. Let’s see how we can improve on this. Tool Calling The tool calling technique consists of prepending the prompt with a list of external functions the LLM can make use of, along with descriptions of what each is good for and instructions on how to signal in the output that it wants to use one (or more) of these functions. Finally, you—the developer of the application—should parse the output and call the appropriate functions. Here’s one way to do this: xviii | Preface
Comments 0
Loading comments...
Reply to Comment
Edit Comment