Designing Large Language Model Applications A Holistic Approach to LLMs (Suhas Pai) (z-library.sk, 1lib.sk, z-lib.sk)

Author: Suhas Pai

AI

Large language models (LLMs) have proven themselves to be powerful tools for solving a wide range of tasks, and enterprises have taken note. But transitioning from demos and prototypes to full-fledged applications can be difficult. This book helps close that gap, providing the tools, techniques, and playbooks that practitioners need to build useful products that incorporate the power of language models. Experienced ML researcher Suhas Pai offers practical advice on harnessing LLMs for your use cases and dealing with commonly observed failure modes. You’ll take a comprehensive deep dive into the ingredients that make up a language model, explore various techniques for customizing them such as fine-tuning, learn about application paradigms like RAG (retrieval-augmented generation) and agents, and more. Understand how to prepare datasets for training and fine-tuning Develop an intuition about the Transformer architecture and its variants Adapt pretrained language models to your own domain and use cases Learn effective techniques for fine-tuning, domain adaptation, and inference optimization Interface language models with external tools and data and integrate them into an existing software ecosystem

📄 File Format: PDF
💾 File Size: 4.3 MB
8
Views
0
Downloads
0.00
Total Donations

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

📄 Page 1
Designing Large Language Model Applications Suhas Pai A Holistic Approach to LLMs
📄 Page 2
ISBN: 978-1-098-15050-1 US $79.99 CAN $99.99 DATA Large language models (LLMs) have proven themselves to be powerful tools for solving a wide range of tasks, and enterprises have taken note. But transitioning from demos and prototypes to full-fledged applications can be difficult. This book helps close that gap, providing the tools, techniques, and playbooks that practitioners need to build useful products that incorporate the power of language models. Experienced ML researcher Suhas Pai offers practical advice on harnessing LLMs for your use cases and dealing with commonly observed failure modes. You’ll take a comprehensive deep dive into the ingredients that make up a language model, explore various techniques for customizing them such as fine-tuning, learn about application paradigms like RAG (retrieval-augmented generation) and agents, and more. • Understand how to prepare datasets for training and fine-tuning • Develop an intuition about the Transformer architecture and its variants • Adapt pretrained language models to your own domain and use cases • Learn effective techniques for fine-tuning, domain adaptation, and inference optimization • Interface language models with external tools and data and integrate them into an existing software ecosystem Suhas Pai is the cofounder, CTO, and ML research lead at Hudson Labs, a Y Combinator-backed AI and fintech startup. He has contributed to the development of several open source LLMs, including the BLOOM LLM project at BigScience, where he was colead of the privacy working group. Designing Large Language Model Applications “You can spend weeks or months trawling through endless papers, tools, and benchmarks to develop the level of intuition for how LLMs work which you get from simply reading Designing Large Language Model Applications. I highly recommend this book.” Megan Risdal, lead product manager, Kaggle (Google) “Designing Large Language Model Applications is a master class in building advanced AI systems.” Jay Alammar, coauthor, Hands-On Large Language Models “This rare, well-curated book covers all the important ideas and practical know-how that matter in the field.” Madhav Singhal, CEO, AutoComputer
📄 Page 3
Praise for Designing Large Language Model Applications Designing Large Language Model Applications is a masterclass in building advanced AI systems. It builds toward a powerful synthesis of advanced methods like tool use, reasoning, RAG, and fine-tuning, equipping readers to create the next generation of AI applications. —Jay Alammar, coauthor, Hands-On Large Language Models Designing Large Language Model Applications is a comprehensive tour of LLMs, offering lucid explanations of everything from fundamental concepts like prompting and fine- tuning to emerging trends like inference-time compute and reasoning. But, more importantly, readers will develop genuine intuition for how these models behave in practice. The hands-on exercises help to reinforce these intuitions in creative, engaging ways which makes this book not just an invaluable reference, but a way for software engineers, ML practitioners, and product managers to build up their own toolkit for developing practical applications with LLMs. —Megan Risdal, lead product manager, Kaggle (Google) Designing Large Language Model Applications is a complete, up-to-date guide on the concepts and techniques behind researching, designing, and building large language model applications. Drawing from his deep engineering and research experience, the author provides clear explanations and practical insights on topics across research and industry, enriched with valuable references to prior work and tooling. Thoughtfully crafted exercises help readers build intuition and experimental muscle. A rare, well-curated book that covers all the important ideas and practical know-how that matter in the field. —Madhav Singhal, CEO, AutoComputer
📄 Page 4
Suhas draws on his rich experience to guide the reader through a comprehensive overview of fundamentals and the newest battle-tested techniques. The timeliness of this practical book will be very useful for a whole new generation of LLM builders. —Susan Shu Chang, principal data scientist, Elastic Incredibly comprehensive! —Nour Fahmy, Flagship RTL
📄 Page 5
Suhas Pai Designing Large Language Model Applications A Holistic Approach to LLMs
📄 Page 6
978-1-098-15050-1 [LSI] Designing Large Language Model Applications by Suhas Pai Copyright © 2025 Suhas Pai. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Nicole Butterfield Development Editor: Michele Cronin Production Editor: Ashley Stussy Copyeditor: Piper Content Partners Proofreader: Emily Wydeven Indexer: Potomac Indexing, LLC Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea March 2025: First Edition Revision History for the First Edition 2025-03-06: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098150501 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Designing Large Language Model Appli‐ cations, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. This work is part of a collaboration between O’Reilly and Mission Cloud. See our statement of editorial independence.
📄 Page 7
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Part I. LLM Ingredients 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Defining LLMs 4 A Brief History of LLMs 8 Early Years 8 The Modern LLM Era 10 The Impact of LLMs 11 LLM Usage in the Enterprise 14 Prompting 16 Zero-Shot Prompting 18 Few-Shot Prompting 18 Chain-of-Thought Prompting 19 Prompt Chaining 20 Adversarial Prompting 21 Accessing LLMs Through an API 22 Strengths and Limitations of LLMs 24 Building Your First Chatbot Prototype 27 From Prototype to Production 31 Summary 32 2. Pre-Training Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Ingredients of an LLM 33 v
📄 Page 8
Pre-Training Data Requirements 36 Popular Pre-Training Datasets 39 Synthetic Pre-Training Data 44 Training Data Preprocessing 45 Data Filtering and Cleaning 46 Selecting Quality Documents 51 Deduplication 54 Removing Personally Identifiable Information 57 Training Set Decontamination 62 Data Mixtures 63 Effect of Pre-Training Data on Downstream Tasks 65 Bias and Fairness Issues in Pre-Training Datasets 66 Summary 67 3. Vocabulary and Tokenization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Vocabulary 69 Tokenizers 74 Tokenization Pipeline 77 Normalization 78 Pre-Tokenization 78 Tokenization 79 Byte Pair Encoding 79 WordPiece 81 Special Tokens 83 Summary 85 4. Architectures and Learning Objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Preliminaries 87 Representing Meaning 89 The Transformer Architecture 91 Self-Attention 93 Positional Encoding 96 Feedforward Networks 97 Layer Normalization 97 Loss Functions 98 Intrinsic Model Evaluation 99 Transformer Backbones 99 Encoder-Only Architectures 101 Encoder-Decoder Architectures 102 Decoder-Only Architectures 102 vi | Table of Contents
📄 Page 9
Mixture of Experts 102 Learning Objectives 104 Full Language Modeling 105 Prefix Language Modeling 109 Masked Language Modeling 109 Which Learning Objectives Are Better? 112 Pre-Training Models 113 Summary 116 Part II. Utilizing LLMs 5. Adapting LLMs to Your Use Case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Navigating the LLM Landscape 119 Who Are the LLM providers? 119 Model Flavors 121 Open Source LLMs 125 How to Choose an LLM for Your Task 128 Open Source Versus Proprietary LLMs 129 LLM Evaluation 130 Loading LLMs 137 Hugging Face Accelerate 138 Ollama 138 LLM Inference APIs 139 Decoding Strategies 139 Greedy Decoding 139 Beam Search 140 Top-k Sampling 141 Top-p Sampling 142 Running Inference on LLMs 143 Structured Outputs 144 Model Debugging and Interpretability 146 Summary 148 6. Fine-Tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 The Need for Fine-Tuning 149 Fine-Tuning: A Full Example 150 Learning Algorithms Parameters 152 Memory Optimization Parameters 156 Regularization Parameters 157 Table of Contents | vii
📄 Page 10
Batch Size 159 Parameter-Efficient Fine-Tuning 161 Working with Reduced Precision 161 Putting It All Together 162 Fine-Tuning Datasets 164 Utilizing Publicly Available Instruction-Tuning Datasets 166 LLM-Generated Instruction-Tuning Datasets 169 Summary 171 7. Advanced Fine-Tuning Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Continual Pre-Training 174 Replay (Memory) 177 Parameter Expansion 178 Parameter-Efficient Fine-Tuning 179 Adding New Parameters 180 Subset Methods 185 Combining Multiple Models 186 Model Ensembling 186 Model Fusion 188 Adapter Merging 189 Summary 190 8. Alignment Training and Reasoning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Defining Alignment Training 191 Reinforcement Learning 192 Types of Human Feedback 192 RLHF Example 193 Hallucinations 195 Mitigating Hallucinations 196 Self-Consistency 198 Chain-of-Actions 198 Recitation 199 Sampling Methods for Addressing Hallucination 200 Decoding by Contrasting Layers 201 In-Context Hallucinations 202 Hallucinations Due to Irrelevant Information 204 Reasoning 205 Deductive Reasoning 205 Inductive Reasoning 205 Abductive Reasoning 206 viii | Table of Contents
📄 Page 11
Common Sense Reasoning 206 Inducing Reasoning in LLMs 207 Verifiers for Improving Reasoning 207 Inference-Time Computation 208 Fine-Tuning for Reasoning 210 Summary 210 9. Inference Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 LLM Inference Challenges 211 Inference Optimization Techniques 212 Techniques for Reducing Compute 212 K-V Caching 213 Early Exit 214 Knowledge Distillation 217 Techniques for Accelerating Decoding 220 Speculative Decoding 221 Parallel Decoding 222 Techniques for Reducing Storage Needs 223 Symmetric Quantization 224 Asymmetric Quantization 225 Summary 226 Part III. LLM Application Paradigms 10. Interfacing LLMs with External Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 LLM Interaction Paradigms 230 Passive Approach 231 The Explicit Approach 232 The Autonomous Approach 233 Defining Agents 234 Agentic Workflow 235 Components of an Agentic System 237 Models 238 Tools 239 Data Stores 244 Agent Loop Prompt 247 Guardrails and Verifiers 248 Agent Orchestration Software 256 Summary 257 Table of Contents | ix
📄 Page 12
11. Representation Learning and Embeddings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Introduction to Embeddings 260 Semantic Search 262 Similarity Measures 264 Fine-Tuning Embedding Models 266 Base Models 266 Training Dataset 267 Loss Functions 268 Instruction Embeddings 270 Optimizing Embedding Size 271 Matryoshka Embeddings 272 Binary and Integer Embeddings 273 Product Quantization 274 Chunking 276 Sliding Window Chunking 277 Metadata-Aware Chunking 277 Layout-Aware Chunking 278 Semantic Chunking 278 Late Chunking 279 Vector Databases 280 Interpreting Embeddings 281 Summary 282 12. Retrieval-Augmented Generation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 The Need for RAG 283 Typical RAG Scenarios 285 Deciding When to Retrieve 285 The RAG Pipeline 287 Rewrite 289 Retrieve 293 Rerank 297 Refine 303 Insert 308 Generate 309 RAG for Memory Management 311 RAG for Selecting In-Context Training Examples 314 RAG for Model Training 314 Limitations of RAG 316 RAG Versus Long Context 317 RAG Versus Fine-Tuning 318 x | Table of Contents
📄 Page 13
Summary 319 13. Design Patterns and System Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Multi-LLM Architectures 322 LLM Cascades 322 Routers 324 Task-Specialized LLMs 325 Programming Paradigms 326 DSPy 326 LMQL 329 Summary 329 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Table of Contents | xi
📄 Page 14
(This page has no text content)
📄 Page 15
To The Legend, Kusuma Pai, for showing me how to dream
📄 Page 16
(This page has no text content)
📄 Page 17
Preface In the past few years, progress in the field of artificial intelligence has been occurring at breakneck speeds, spearheaded by advances in LLMs. It was not too long ago that LLMs were a nascent technology that struggled to generate a coherent paragraph; today they are able to solve complex mathematical problems, write convincing essays, and conduct long engaging conversations with humans. As AI advances from strength to strength, it is rapidly being woven into the fabric of society, touching so many facets of our lives. Learning how to use AI models like LLMs effectively might be one of the most useful skills to learn this decade. LLMs are revolutionizing the world of software, and have made possible the development of applications previously considered impossible. With all the promise that LLMs bring, the reality is that they are still not a mature technology and have many limitations like deficiencies in reasoning, lack of adher‐ ence to factuality, “hallucinations”, difficulties in steering them toward our goals, bias and fairness issues, and so on. Despite the existence of these limitations, we can still harness LLMs for good use and build a variety of helpful applications provided we effectively address their shortcomings. Plenty of software frameworks have emerged that enable rapid prototype develop‐ ment of LLM applications. However, advancing from prototypes to production-grade applications is a road much less traveled, and is still a very challenging task. This is where this book comes in—a holistic overview of the LLM landscape that provides you with the intuition and tools to build complex LLM applications. With this book, my goal is to provide you with an intuitive understanding of how LLMs work, the tools you have at your disposal to harness them, and the various application paradigms they can be built with. Unique to this book are the exercises; more than 80 exercises are sprinkled throughout to help you solidify your intuitions and sharpen your understanding of what is happening underneath the hood. While preparing the content of the book, I read over 800 research papers, with many of them referenced and linked at appropriate locations in the book, providing you with xv
📄 Page 18
a jumping off point for further exploration. All in all, I am confident that you will come out of the book an LLM expert if you read the book in its entirety, complete all the exercises, and explore the recommended references. Who This Book Is For This book is intended for a broad audience, including software engineers transition‐ ing to AI application development, machine learning practitioners and scientists, and product managers. Much of the content in this book is borne from my own experi‐ ments with LLMs, so even if you are an experienced scientist, I expect you will find value in it. Similarly, even if you have very limited exposure to the world of AI, I expect you will still find the book useful for understanding the fundamentals of this technology. The only prerequisites for this book are knowledge of Python coding and an under‐ standing of basic machine learning and deep learning principles. Where required, I provide links to external resources that you can use to sharpen or develop your prerequisites. How This Book Is Structured The book is divided into 3 parts with a total of 13 chapters. The first part deals with understanding the ingredients of a language model. I strongly feel that even though you may never train a language model from scratch yourself, knowing what goes into making it is crucial. The second part discusses various ways to harness language models, be it by directly prompting the model, or by fine-tuning it in various ways. It also addresses limitations such as hallucinations and reasoning constraints, along with methods to mitigate these issues. Finally, the third part of the book deals with application paradigms like retrieval augmented generation (RAG) and agents, posi‐ tioning LLMs within the broader context of an entire software system. For an extended table of contents, see my Substack blog post. What This Book Is Not About To keep the book at a reasonable length, certain topics were deemed out of scope. I have taken care to not cover topics that I am not confident will stand the test of time. This field is very fast moving, so writing a book that maintains its relevance over time is extremely challenging. This book focuses only on English-language LLMs and leaves out discussion on mul‐ tilingual models for the most part. I also disagree with the notion of mushing all the non-English languages of the world under the “multilingual” banner. Every language has its own nuances and deserves its own book. xvi | Preface
📄 Page 19
This book also doesn’t cover multimodal models. New models are increasingly multi‐ modal, i.e., a single model supports multiple modalities like text, image, video, speech, etc. However, text remains the most important modality and is the binding substrate in these models. Thus, reading this book will still help you prepare for the multimodal future. This book does not focus on theory or go too deep into math. There are plenty of other books that cover that, and I have generously linked to them where needed. This book contains minimal math equations and instead focuses on building intuitions. This book contains only a rudimentary introduction to reasoning models, the latest LLM paradigm. At the time of the book’s writing, reasoning models are still in their infancy, and the jury is still out on which techniques will prove to be most effective. How to Read the Book The best way to consume this book is to read it sequentially, while working on the exercises and exploring the reference links. That said, there are a few alternative paths, depending on your interests: • If your interest lies in understanding the LLM landscape and not necessarily in building applications with them, you can focus on Chapters 1, 2, 3, 4, 5, 10, and 11. • If you are a product manager seeking to understand the scope of possibilities for LLM applications, Chapters 1, 2, 3, 5, 8, 10, 11, 12, and 13 are a good bet. • If you are an ML scientist, then Chapters 7, 8, 9, 10, 11, and 12 will be sure to give you food-for-thought and new research challenges. • If you want to train your own LLM from scratch, Chapters 2, 3, 4, 5, and 7 will provide you with the foundational principles. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Preface | xvii
📄 Page 20
Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://oreil.ly/llm-playbooks. If you have a technical question or a problem using the code examples, please send email to support@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Designing Large Language Model Applications by Suhas Pai (O’Reilly). Copyright 2025 Suhas Pai, 978-1-098-15050-1.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. xviii | Preface
The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00
Total Amount (¥)
0
Donation Count

Login to support the author

Login Now

Recommended for You

Loading recommended books...
Failed to load, please try again later
Back to List