AI Engineering Building Applications with Foundation Models (Chip Huyen) (Z-Library)

Author: Chip Huyen

技术

Recent breakthroughs in AI have not only increased demand for AI products, they've also lowered the barriers to entry for those who want to build AI products. The model-as-a-service approach has transformed AI from an esoteric discipline into a powerful development tool that anyone can use. Everyone, including those with minimal or no prior AI experience, can now leverage AI models to build applications. In this book, author Chip Huyen discusses AI engineering: the process of building applications with readily available foundation models. The book starts with an overview of AI engineering, explaining how it differs from traditional ML engineering and discussing the new AI stack. The more AI is used, the more opportunities there are for catastrophic failures, and therefore, the more important evaluation becomes. This book discusses different approaches to evaluating open-ended models, including the rapidly growing AI-as-a-judge approach. AI application developers will discover how to navigate the AI landscape, including models, datasets, evaluation benchmarks, and the seemingly infinite number of use cases and application patterns. You'll learn a framework for developing an AI application, starting with simple techniques and progressing toward more sophisticated methods, and discover how to efficiently deploy these applications. Table of Contents: 1. Introduction to Building AI Applications with Foundation Models 2. Understanding Foundation Models 3. Evaluation Methodology 4. Evaluate AI Systems 5. Prompt Engineering 6. RAG and Agents 7. Finetuning 8. Dataset Engineering 9. Inference Optimization 10. AI Engineering Architecture and User Feedback Epilogue Index

📄 File Format: PDF

💾 File Size: 31.9 MB

329

Views

Downloads

0.00

Total Donations

📖 Read Online ⬇️ Download

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

📄 Page 1

Chip Huyen AI Engineering Building Applications with Foundation Models

📄 Page 2

9 7 8 1 0 9 8 1 6 6 3 0 4 5 7 9 9 9 ISBN: 978-1-098-16630-4 US $79.99 CAN $99.99 DATA Foundation models have enabled many new AI use cases while lowering the barriers to entry for building AI products. This has transformed AI from an esoteric discipline into a powerful development tool that anyone can use—including those with no prior AI experience. In this accessible guide, author Chip Huyen discusses AI engineering: the process of building applications with readily available foundation models. AI application developers will discover how to navigate the AI landscape, including models, datasets, evaluation benchmarks, and the seemingly infinite number of application patterns. The book also introduces a practical framework for developing an AI application and efficiently deploying it. • Understand what AI engineering is and how it differs from traditional machine learning engineering • Learn the process for developing an AI application, the challenges at each step, and approaches to address them • Explore various model adaptation techniques, including prompt engineering, RAG, finetuning, agents, and dataset engineering, and understand how and why they work • Examine the bottlenecks for latency and cost when serving foundation models and learn how to overcome them • Choose the right model, metrics, data, and developmental patterns for your needs AI Engineering “This book of fers a comprehensive, well-structured guide to the essential aspects of building generative AI systems. A must-read for any professional looking to scale AI across the enterprise.” Vittorio Cretella, former global CIO at P&G and Mars “Chip Huyen gets generative AI. She is a remarkable teacher and writer whose work has been instrumental in helping teams bring AI into production. Drawing on her deep expertise, AI Engineering is a comprehensive and holistic guide to building generative AI applications in production.” Luke Metz, cocreator of ChatGPT, former research manager at OpenAI Chip Huyen works at the intersection of AI, data, and storytelling. Previously, she was with Snorkel AI and NVIDIA, founded an AI infrastructure startup (acquired), and taught machine learning systems design at Stanford. Her book Designing Machine Learning Systems (O’Reilly) has been translated into over 10 languages.

📄 Page 3

Praise for AI Engineering This book offers a comprehensive, well-structured guide to the essential aspects of building generative AI systems. A must-read for any professional looking to scale AI across the enterprise. —Vittorio Cretella, former global CIO, P&G and Mars Chip Huyen gets generative AI. On top of that, she is a remarkable teacher and writer whose work has been instrumental in helping teams bring AI into production. Drawing on her deep expertise, AI Engineering serves as a comprehensive and holistic guide, masterfully detailing everything required to design and deploy generative AI applications in production. —Luke Metz, cocreator of ChatGPT, former research manager at OpenAI Every AI engineer building real-world applications should read this book. It’s a vital guide to end-to-end AI system design, from model development and evaluation to large-scale deployment and operation. —Andrei Lopatenko, Director Search and AI, Neuron7 This book serves as an essential guide for building AI products that can scale. Unlike other books that focus on tools or current trends that are constantly changing, Chip delivers timeless foundational knowledge. Whether you’re a product manager or an engineer, this book effectively bridges the collaboration gap between cross-functional teams, making it a must-read for anyone involved in AI development. —Aileen Bui, AI Product Operations Manager, Google

📄 Page 4

This is the definitive segue into AI engineering from one of the greats of ML engineering! Chip has seen through successful projects and careers at every stage of a company and for the first time ever condensed her expertise for new AI Engineers entering the field. —swyx, Curator, AI.Engineer AI Engineering is a practical guide that provides the most up-to-date information on AI development, making it approachable for novice and expert leaders alike. This book is an essential resource for anyone looking to build robust and scalable AI systems. —Vicki Reyzelman, Chief AI Solutions Architect, Mave Sparks AI Engineering is a comprehensive guide that serves as an essential reference for both understanding and implementing AI systems in practice. —Han Lee, Director—Data Science, Moody’s AI Engineering is an essential guide for anyone building software with Generative AI! It demystifies the technology, highlights the importance of evaluation, and shares what should be done to achieve quality before starting with costly fine-tuning. —Rafal Kawala, Senior AI Engineering Director, 16 years of experience working in a Fortune 500 company

📄 Page 5

Chip Huyen AI Engineering Building Applications with Foundation Models

📄 Page 6

978-1-098-16630-4 [LSI] AI Engineering by Chip Huyen Copyright © 2025 Developer Experience Advisory LLC. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institu‐ tional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Nicole Butterfield Indexer: WordCo Indexing Services, Inc. Development Editor: Melissa Potter Interior Designer: David Futato Production Editor: Beth Kelly Cover Designer: Karen Montgomery Copyeditor: Liz Wheeler Illustrator: Kate Dullea Proofreader: Piper Editorial Consulting, LLC December 2024: First Edition Revision History for the First Edition 2024-12-04: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098166304 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. AI Engineering, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

📄 Page 7

Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1. Introduction to Building AI Applications with Foundation Models. . . . . . . . . . . . . . . . . . 1 The Rise of AI Engineering 2 From Language Models to Large Language Models 2 From Large Language Models to Foundation Models 8 From Foundation Models to AI Engineering 12 Foundation Model Use Cases 16 Coding 20 Image and Video Production 22 Writing 22 Education 24 Conversational Bots 26 Information Aggregation 26 Data Organization 27 Workflow Automation 28 Planning AI Applications 28 Use Case Evaluation 29 Setting Expectations 32 Milestone Planning 33 Maintenance 34 The AI Engineering Stack 35 Three Layers of the AI Stack 37 AI Engineering Versus ML Engineering 39 AI Engineering Versus Full-Stack Engineering 46 Summary 47 v

📄 Page 8

2. Understanding Foundation Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Training Data 50 Multilingual Models 51 Domain-Specific Models 56 Modeling 58 Model Architecture 58 Model Size 67 Post-Training 78 Supervised Finetuning 80 Preference Finetuning 83 Sampling 88 Sampling Fundamentals 88 Sampling Strategies 90 Test Time Compute 96 Structured Outputs 99 The Probabilistic Nature of AI 105 Summary 111 3. Evaluation Methodology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Challenges of Evaluating Foundation Models 114 Understanding Language Modeling Metrics 118 Entropy 119 Cross Entropy 120 Bits-per-Character and Bits-per-Byte 121 Perplexity 121 Perplexity Interpretation and Use Cases 122 Exact Evaluation 125 Functional Correctness 126 Similarity Measurements Against Reference Data 127 Introduction to Embedding 134 AI as a Judge 136 Why AI as a Judge? 137 How to Use AI as a Judge 138 Limitations of AI as a Judge 141 What Models Can Act as Judges? 145 Ranking Models with Comparative Evaluation 148 Challenges of Comparative Evaluation 152 The Future of Comparative Evaluation 155 Summary 156 vi | Table of Contents

📄 Page 9

4. Evaluate AI Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Evaluation Criteria 160 Domain-Specific Capability 161 Generation Capability 163 Instruction-Following Capability 172 Cost and Latency 177 Model Selection 179 Model Selection Workflow 179 Model Build Versus Buy 181 Navigate Public Benchmarks 191 Design Your Evaluation Pipeline 200 Step 1. Evaluate All Components in a System 200 Step 2. Create an Evaluation Guideline 202 Step 3. Define Evaluation Methods and Data 204 Summary 208 5. Prompt Engineering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Introduction to Prompting 212 In-Context Learning: Zero-Shot and Few-Shot 213 System Prompt and User Prompt 215 Context Length and Context Efficiency 218 Prompt Engineering Best Practices 220 Write Clear and Explicit Instructions 220 Provide Sufficient Context 223 Break Complex Tasks into Simpler Subtasks 224 Give the Model Time to Think 227 Iterate on Your Prompts 229 Evaluate Prompt Engineering Tools 230 Organize and Version Prompts 233 Defensive Prompt Engineering 235 Proprietary Prompts and Reverse Prompt Engineering 236 Jailbreaking and Prompt Injection 238 Information Extraction 243 Defenses Against Prompt Attacks 248 Summary 251 6. RAG and Agents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 RAG 253 RAG Architecture 256 Retrieval Algorithms 257 Retrieval Optimization 267 Table of Contents | vii

📄 Page 10

RAG Beyond Texts 273 Agents 275 Agent Overview 276 Tools 278 Planning 281 Agent Failure Modes and Evaluation 298 Memory 300 Summary 305 7. Finetuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Finetuning Overview 308 When to Finetune 311 Reasons to Finetune 311 Reasons Not to Finetune 312 Finetuning and RAG 316 Memory Bottlenecks 319 Backpropagation and Trainable Parameters 320 Memory Math 322 Numerical Representations 325 Quantization 328 Finetuning Techniques 332 Parameter-Efficient Finetuning 332 Model Merging and Multi-Task Finetuning 347 Finetuning Tactics 357 Summary 361 8. Dataset Engineering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Data Curation 365 Data Quality 368 Data Coverage 369 Data Quantity 372 Data Acquisition and Annotation 377 Data Augmentation and Synthesis 380 Why Data Synthesis 381 Traditional Data Synthesis Techniques 383 AI-Powered Data Synthesis 386 Model Distillation 395 Data Processing 396 Inspect Data 397 Deduplicate Data 399 Clean and Filter Data 401 viii | Table of Contents

📄 Page 11

Format Data 401 Summary 403 9. Inference Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Understanding Inference Optimization 406 Inference Overview 406 Inference Performance Metrics 412 AI Accelerators 419 Inference Optimization 426 Model Optimization 426 Inference Service Optimization 440 Summary 447 10. AI Engineering Architecture and User Feedback. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 AI Engineering Architecture 449 Step 1. Enhance Context 450 Step 2. Put in Guardrails 451 Step 3. Add Model Router and Gateway 456 Step 4. Reduce Latency with Caches 460 Step 5. Add Agent Patterns 463 Monitoring and Observability 465 AI Pipeline Orchestration 472 User Feedback 474 Extracting Conversational Feedback 475 Feedback Design 480 Feedback Limitations 490 Summary 492 Epilogue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 Table of Contents | ix

📄 Page 12

(This page has no text content)

📄 Page 13

1 An author of the AlexNet paper, Ilya Sutskever, went on to cofound OpenAI, turning this lesson into reality with GPT models. 2 Even my small project in 2017, which used a language model to evaluate translation quality, concluded that we needed “a better language model.” Preface When ChatGPT came out, like many of my colleagues, I was disoriented. What sur‐ prised me wasn’t the model’s size or capabilities. For over a decade, the AI commu‐ nity has known that scaling up a model improves it. In 2012, the AlexNet authors noted in their landmark paper that: “All of our experiments suggest that our results can be improved simply by waiting for faster GPUs and bigger datasets to become available.”1, 2 What surprised me was the sheer number of applications this capability boost unlocked. I thought a small increase in model quality metrics might result in a mod‐ est increase in applications. Instead, it resulted in an explosion of new possibilities. Not only have these new AI capabilities increased the demand for AI applications, but they have also lowered the entry barrier for developers. It’s become so easy to get started with building AI applications. It’s even possible to build an application without writing a single line of code. This shift has transformed AI from a specialized discipline into a powerful development tool everyone can use. Even though AI adoption today seems new, it’s built upon techniques that have been around for a while. Papers about language modeling came out as early as the 1950s. Retrieval-augmented generation (RAG) applications are built upon retrieval technol‐ ogy that has powered search and recommender systems since long before the term RAG was coined. The best practices for deploying traditional machine learning appli‐ cations—systematic experimentation, rigorous evaluation, relentless optimization for faster and cheaper models—are still the best practices for working with foundation model-based applications. xi

📄 Page 14

The familiarity and ease of use of many AI engineering techniques can mislead peo‐ ple into thinking there is nothing new to AI engineering. But while many principles for building AI applications remain the same, the scale and improved capabilities of AI models introduce opportunities and challenges that require new solutions. This book covers the end-to-end process of adapting foundation models to solve real- world problems, encompassing tried-and-true techniques from other engineering fields and techniques emerging with foundation models. I set out to write the book because I wanted to learn, and I did learn a lot. I learned from the projects I worked on, the papers I read, and the people I interviewed. During the process of writing this book, I used notes from over 100 conversations and interviews, including researchers from major AI labs (OpenAI, Google, Anthropic, ...), framework developers (NVIDIA, Meta, Hugging Face, Anyscale, LangChain, LlamaIndex, ...), executives and heads of AI/data at companies of differ‐ ent sizes, product managers, community researchers, and independent application developers (see “Acknowledgments” on page xx). I especially learned from early readers who tested my assumptions, introduced me to different perspectives, and exposed me to new problems and approaches. Some sec‐ tions of the book have also received thousands of comments from the community after being shared on my blog, many giving me new perspectives or confirming a hypothesis. I hope that this learning process will continue for me now that the book is in your hands, as you have experiences and perspectives that are unique to you. Please feel free to share any feedback you might have for this book with me via X, LinkedIn, or email at hi@huyenchip.com. What This Book Is About This book provides a framework for adapting foundation models, which include both large language models (LLMs) and large multimodal models (LMMs), to specific applications. There are many different ways to build an application. This book outlines various solutions and also raises questions you can ask to evaluate the best solution for your needs. Some of the many questions that this book can help you answer are: • Should I build this AI application? • How do I evaluate my application? Can I use AI to evaluate AI outputs? • What causes hallucinations? How do I detect and mitigate hallucinations? • What are the best practices for prompt engineering? • Why does RAG work? What are the strategies for doing RAG? xii | Preface

📄 Page 15

3 Teaching a course on how to use TensorFlow in 2017 taught me a painful lesson about how quickly tools and tutorials become outdated. • What’s an agent? How do I build and evaluate an agent? • When to finetune a model? When not to finetune a model? • How much data do I need? How do I validate the quality of my data? • How do I make my model faster, cheaper, and secure? • How do I create a feedback loop to improve my application continually? The book will also help you navigate the overwhelming AI landscape: types of mod‐ els, evaluation benchmarks, and a seemingly infinite number of use cases and appli‐ cation patterns. The content in this book is illustrated using case studies, many of which I worked on, backed by ample references and extensively reviewed by experts from a wide range of backgrounds. Although the book took two years to write, it draws from my experi‐ ence working with language models and ML systems from the last decade. Like my previous O’Reilly book, Designing Machine Learning Systems (DMLS), this book focuses on the fundamentals of AI engineering instead of any specific tool or API. Tools become outdated quickly, but fundamentals should last longer.3 Reading AI Engineering (AIE) with Designing Machine Learning Systems (DMLS) AIE can be a companion to DMLS. DMLS focuses on building applications on top of traditional ML models, which involves more tabular data annotations, feature engi‐ neering, and model training. AIE focuses on building applications on top of founda‐ tion models, which involves more prompt engineering, context construction, and parameter-efficient finetuning. Both books are self-contained and modular, so you can read either book independently. Since foundation models are ML models, some concepts are relevant to working with both. If a topic is relevant to AIE but has been discussed extensively in DMLS, it’ll still be covered in this book, but to a lesser extent, with pointers to relevant resources. Note that many topics are covered in DMLS but not in AIE, and vice versa. The first chapter of this book also covers the differences between traditional ML engineering and AI engineering. A real-world system often involves both traditional ML models and foundation models, so knowledge about working with both is often necessary. Preface | xiii

📄 Page 16

Determining whether something will last, however, is often challenging. I relied on three criteria. First, for a problem, I determined whether it results from the funda‐ mental limitations of how AI works or if it’ll go away with better models. If a problem is fundamental, I’ll analyze its challenges and solutions to address each challenge. I’m a fan of the start-simple approach, so for many problems, I’ll start from the simplest solution and then progress with more complex solutions to address rising challenges. Second, I consulted an extensive network of researchers and engineers, who are smarter than I am, about what they think are the most important problems and solutions. Occasionally, I also relied on Lindy’s Law, which infers that the future life expectancy of a technology is proportional to its current age. So if something has been around for a while, I assume that it’ll continue existing for a while longer. In this book, however, I occasionally included a concept that I believe to be tempo‐ rary because it’s immediately useful for some application developers or because it illustrates an interesting problem-solving approach. What This Book Is Not This book isn’t a tutorial. While it mentions specific tools and includes pseudocode snippets to illustrate certain concepts, it doesn’t teach you how to use a tool. Instead, it offers a framework for selecting tools. It includes many discussions on the trade- offs between different solutions and the questions you should ask when evaluating a solution. When you want to use a tool, it’s usually easy to find tutorials for it online. AI chatbots are also pretty good at helping you get started with popular tools. This book isn’t an ML theory book. It doesn’t explain what a neural network is or how to build and train a model from scratch. While it explains many theoretical con‐ cepts immediately relevant to the discussion, the book is a practical book that focuses on helping you build successful AI applications to solve real-world problems. While it’s possible to build foundation model-based applications without ML exper‐ tise, a basic understanding of ML and statistics can help you build better applications and save you from unnecessary suffering. You can read this book without any prior ML background. However, you will be more effective while building AI applications if you know the following concepts: • Probabilistic concepts such as sampling, determinism, and distribution. • ML concepts such as supervision, self-supervision, log-likelihood, gradient descent, backpropagation, loss function, and hyperparameter tuning. xiv | Preface

📄 Page 17

• Various neural network architectures, including feedforward, recurrent, and transformer. • Metrics such as accuracy, F1, precision, recall, cosine similarity, and cross entropy. If you don’t know them yet, don’t worry—this book has either brief, high-level explanations or pointers to resources that can get you up to speed. Who This Book Is For This book is for anyone who wants to leverage foundation models to solve real-world problems. This is a technical book, so the language of this book is geared toward technical roles, including AI engineers, ML engineers, data scientists, engineering managers, and technical product managers. This book is for you if you can relate to one of the following scenarios: • You’re building or optimizing an AI application, whether you’re starting from scratch or looking to move beyond the demo phase into a production-ready stage. You may also be facing issues like hallucinations, security, latency, or costs, and need targeted solutions. • You want to streamline your team’s AI development process, making it more systematic, faster, and reliable. • You want to understand how your organization can leverage foundation models to improve the business’s bottom line and how to build a team to do so. You can also benefit from the book if you belong to one of the following groups: • Tool developers who want to identify underserved areas in AI engineering to position your products in the ecosystem. • Researchers who want to better understand AI use cases. • Job candidates seeking clarity on the skills needed to pursue a career as an AI engineer. • Anyone wanting to better understand AI’s capabilities and limitations, and how it might affect different roles. I love getting to the bottom of things, so some sections dive a bit deeper into the tech‐ nical side. While many early readers like the detail, it might not be for everyone. I’ll give you a heads-up before things get too technical. Feel free to skip ahead if it feels a little too in the weeds! Preface | xv

📄 Page 18

Navigating This Book This book is structured to follow the typical process for developing an AI application. Here’s what this typical process looks like and how each chapter fits into the process. Because this book is modular, you’re welcome to skip any section that you’re already familiar with or that is less relevant to you. Before deciding to build an AI application, it’s necessary to understand what this pro‐ cess involves and answer questions such as: Is this application necessary? Is AI needed? Do I have to build this application myself? The first chapter of the book helps you answer these questions. It also covers a range of successful use cases to give a sense of what foundation models can do. While an ML background is not necessary to build AI applications, understanding how a foundation model works under the hood is useful to make the most out of it. Chapter 2 analyzes the making of a foundation model and the design decisions with significant impacts on downstream applications, including its training data recipe, model architectures and scales, and how the model is trained to align to human pref‐ erence. It then discusses how a model generates a response, which helps explain the model’s seemingly baffling behaviors, like inconsistency and hallucinations. Chang‐ ing the generation setting of a model is also often a cheap and easy way to signifi‐ cantly boost the model’s performance. Once you’ve committed to building an application with foundation models, evalua‐ tion will be an integral part of every step along the way. Evaluation is one of the hard‐ est, if not the hardest, challenges of AI engineering. This book dedicates two chapters, Chapters 3 and 4, to explore different evaluation methods and how to use them to create a reliable and systematic evaluation pipeline for your application. Given a query, the quality of a model’s response depends on the following aspects (outside of the model’s generation setting): • The instructions for how the model should behave • The context the model can use to respond to the query • The model itself The next three chapters of the book focus on how to optimize each of these aspects to improve a model’s performance for an application. Chapter 5 covers prompt engi‐ neering, starting with what a prompt is, why prompt engineering works, and prompt engineering best practices. It then discusses how bad actors can exploit your applica‐ tion with prompt attacks and how to defend your application against them. Chapter 6 explores why context is important for a model to generate accurate respon‐ ses. It zooms into two major application patterns for context construction: RAG and agentic. The RAG pattern is better understood and has proven to work well in xvi | Preface

📄 Page 19

production. On the other hand, while the agentic pattern promises to be much more powerful, it’s also more complex and is still being explored. Chapter 7 is about how to adapt a model to an application by changing the model itself with finetuning. Due to the scale of foundation models, native model finetuning is memory-intensive, and many techniques are developed to allow finetuning better models with less memory. The chapter covers different finetuning approaches, sup‐ plemented by a more experimental approach: model merging. This chapter contains a more technical section that shows how to calculate the memory footprint of a model. Due to the availability of many finetuning frameworks, the finetuning process itself is often straightforward. However, getting data for finetuning is hard. The next chapter is all about data, including data acquisition, data annotations, data synthesis, and data processing. Many of the topics discussed in Chapter 8 are relevant beyond finetuning, including the question of what data quality means and how to evaluate the quality of your data. If Chapters 5 to 8 are about improving a model’s quality, Chapter 9 is about making its inference cheaper and faster. It discusses optimization both at the model level and inference service level. If you’re using a model API—i.e., someone else hosts your model for you—this API will likely take care of inference optimization for you. How‐ ever, if you host the model yourself—either an open source model or a model devel‐ oped in-house—you’ll need to implement many of the techniques discussed in this chapter. The last chapter in the book brings together the different concepts from this book to build an application end-to-end. The second part of the chapter is more product- focused, with discussions on how to design a user feedback system that helps you col‐ lect useful feedback while maintaining a good user experience. I often use “we” in this book to mean you (the reader) and I. It’s a habit I got from my teaching days, as I saw writing as a shared learning experience for both the writer and the readers. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Preface | xvii

📄 Page 20

Constant width Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, input prompts into models, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/chiphuyen/aie-book. The repository contains additional resources about AI engineering, including important papers and helpful tools. It also covers topics that are too deep to go into in this book. For those interested in the process of writing this book, the GitHub repository also contains behind-the-scenes informa‐ tion and statistics about the book. If you have a technical question or a problem using the code examples, please send email to support@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from xviii | Preface

The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00

Total Amount (¥)

Donation Count

← Back to List