LLMs in Production From language models to successful products (Christopher Brousseau, Matthew Sharp) (Z-Library)

M A N N I N G Christopher Brousseau Matthew Sharp Foreword by Joe Reis From language models to successful products

LLMs in Production

(This page has no text content)

LLMs in Production FROM LANGUAGE MODELS TO SUCCESSFUL PRODUCTS CHRISTOPHER BROUSSEAU MATTHEW SHARP FOREWORD BY JOE REIS MANN I NG SHELTER ISLAND

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2025 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. The authors and publisher have made every effort to ensure that the information in this book was correct at press time. The authors and publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause, or from any usage of the information herein. Manning Publications Co. Development editor: Doug Rudder 20 Baldwin Road Technical editor: Daniel Leybzon PO Box 761 Review editor: Dunja Nikitović Shelter Island, NY 11964 Production editor: Aleksandar Dragosavljević Copy editor: Alisa Larson Proofreader: Melody Dolab Technical proofreader: Byron Galbraith Typesetter: Dennis Dalinnik Cover designer: Marija Tudor ISBN: 9781633437203 Printed in the United States of America

To my wife Jess and my kids, Odin, Magnus, and Emrys, who have supported me through thick and thin —Christopher Brousseau I dedicate this book to Evelyn, my wife, and our daughter, Georgina. Evelyn, thank you for your unwavering support and encouragement through every step of this journey. Your sacrifices have been paramount to making this happen. And to my daughter, you are an endless source of inspiration and motivation. Your smile brightens my day and helps remind me to enjoy the small moments in this world. I hope and believe this book will help build a better tomorrow for both of you. —Matthew Sharp

(This page has no text content)

v brief contents 1 ■ Words’ awakening: Why large language models have captured attention 1 2 ■ Large language models: A deep dive into language modeling 20 3 ■ Large language model operations: Building a platform for LLMs 73 4 ■ Data engineering for large language models: Setting up for success 111 5 ■ Training large language models: How to generate the generator 154 6 ■ Large language model services: A practical guide 201 7 ■ Prompt engineering: Becoming an LLM whisperer 254 8 ■ Large language model applications: Building an interactive experience 279 9 ■ Creating an LLM project: Reimplementing Llama 3 305 10 ■ Creating a coding copilot project: This would have helped you earlier 332 11 ■ Deploying an LLM on a Raspberry Pi: How low can you go? 355 12 ■ Production, an ever-changing landscape: Things are just getting started 379

contents foreword xi preface xii acknowledgments xiv about the book xvi about the authors xix about the cover illustration xx 1 Words’ awakening: Why large language models have captured attention 1 1.1 Large language models accelerating communication 3 1.2 Navigating the build-and-buy decision with LLMs 7 Buying: The beaten path 8 ■ Building: The path less traveled 9 A word of warning: Embrace the future now 15 1.3 Debunking myths 16 2 Large language models: A deep dive into language modeling 20 2.1 Language modeling 21 Linguistic features 23 ■ Semiotics 29 ■ Multilingual NLP 32 2.2 Language modeling techniques 33 N-gram and corpus-based techniques 34 ■ Bayesian techniques 36 Markov chains 40 ■ Continuous language modeling 43vi

CONTENTS viiEmbeddings 47 ■ Multilayer perceptrons 49 ■ Recurrent neural networks and long short-term memory networks 51 Attention 58 2.3 Attention is all you need 60 Encoders 61 ■ Decoders 62 ■ Transformers 64 2.4 Really big transformers 66 3 Large language model operations: Building a platform for LLMs 73 3.1 Introduction to large language model operations 73 3.2 Operations challenges with large language models 74 Long download times 74 ■ Longer deploy times 75 Latency 76 ■ Managing GPUs 77 ■ Peculiarities of text data 77 ■ Token limits create bottlenecks 78 ■ Hallucinations cause confusion 80 ■ Bias and ethical considerations 81 Security concerns 81 ■ Controlling costs 84 3.3 LLMOps essentials 84 Compression 84 ■ Distributed computing 93 3.4 LLM operations infrastructure 99 Data infrastructure 101 ■ Experiment trackers 102 ■ Model registry 103 ■ Feature stores 104 ■ Vector databases 105 Monitoring system 106 ■ GPU-enabled workstations 107 Deployment service 108 4 Data engineering for large language models: Setting up for success 111 4.1 Models are the foundation 112 GPT 113 ■ BLOOM 114 ■ LLaMA 115 ■ Wizard 115 Falcon 116 ■ Vicuna 116 ■ Dolly 116 ■ OpenChat 117 4.2 Evaluating LLMs 118 Metrics for evaluating text 118 ■ Industry benchmarks 121 Responsible AI benchmarks 126 ■ Developing your own benchmark 128 ■ Evaluating code generators 130 Evaluating model parameters 131 4.3 Data for LLMs 133 Datasets you should know 134 ■ Data cleaning and preparation 138

CONTENTSviii4.4 Text processors 144 Tokenization 144 ■ Embeddings 149 4.5 Preparing a Slack dataset 152 5 Training large language models: How to generate the generator 154 5.1 Multi-GPU environments 155 Setting up 155 ■ Libraries 159 5.2 Basic training techniques 161 From scratch 162 ■ Transfer learning (finetuning) 169 Prompting 174 5.3 Advanced training techniques 175 Prompt tuning 175 ■ Finetuning with knowledge distillation 181 ■ Reinforcement learning with human feedback 185 ■ Mixture of experts 188 ■ LoRA and PEFT 191 5.4 Training tips and tricks 196 Training data size notes 196 ■ Efficient training 197 Local minima traps 198 ■ Hyperparameter tuning tips 198 A note on operating systems 199 ■ Activation function advice 199 6 Large language model services: A practical guide 201 6.1 Creating an LLM service 202 Model compilation 203 ■ LLM storage strategies 209 Adaptive request batching 212 ■ Flow control 212 Streaming responses 215 ■ Feature store 216 Retrieval-augmented generation 219 ■ LLM service libraries 223 6.2 Setting up infrastructure 224 Provisioning clusters 225 ■ Autoscaling 227 ■ Rolling updates 232 ■ Inference graphs 234 ■ Monitoring 237 6.3 Production challenges 240 Model updates and retraining 241 ■ Load testing 241 Troubleshooting poor latency 245 ■ Resource management 247 Cost engineering 248 ■ Security 249 6.4 Deploying to the edge 251

CONTENTS ix7 Prompt engineering: Becoming an LLM whisperer 254 7.1 Prompting your model 255 Few-shot prompting 255 ■ One-shot prompting 257 Zero-shot prompting 258 7.2 Prompt engineering basics 260 Anatomy of a prompt 261 ■ Prompting hyperparameters 263 Scrounging the training data 265 7.3 Prompt engineering tooling 266 LangChain 266 ■ Guidance 267 ■ DSPy 270 ■ Other tooling is available but . . . 271 7.4 Advanced prompt engineering techniques 271 Giving LLMs tools 271 ■ ReAct 274 8 Large language model applications: Building an interactive experience 279 8.1 Building an application 280 Streaming on the frontend 281 ■ Keeping a history 284 Chatbot interaction features 287 ■ Token counting 290 RAG applied 291 8.2 Edge applications 293 8.3 LLM agents 296 9 Creating an LLM project: Reimplementing Llama 3 305 9.1 Implementing Meta’s Llama 306 Tokenization and configuration 306 ■ Dataset, data loading, evaluation, and generation 309 ■ Network architecture 314 9.2 Simple Llama 317 9.3 Making it better 321 Quantization 322 ■ LoRA 323 ■ Fully sharded data parallel– quantized LoRA 326 9.4 Deploy to a Hugging Face Hub Space 328 10 Creating a coding copilot project: This would have helped you earlier 332 10.1 Our model 333 10.2 Data is king 336 Our VectorDB 336 ■ Our dataset 337 ■ Using RAG 341

CONTENTSx10.3 Build the VS Code extension 344 10.4 Lessons learned and next steps 351 11 Deploying an LLM on a Raspberry Pi: How low can you go? 355 11.1 Setting up your Raspberry Pi 356 Pi Imager 357 ■ Connecting to Pi 359 ■ Software installations and updates 363 11.2 Preparing the model 364 11.3 Serving the model 366 11.4 Improvements 368 Using a better interface 368 ■ Changing quantization 369 Adding multimodality 370 ■ Serving the model on Google Colab 374 12 Production, an ever-changing landscape: Things are just getting started 379 12.1 A thousand-foot view 380 12.2 The future of LLMs 381 Government and regulation 381 ■ LLMs are getting bigger 386 ■ Multimodal spaces 392 ■ Datasets 393 Solving hallucination 394 ■ New hardware 401 ■ Agents will become useful 402 12.3 Final thoughts 406 appendix A History of linguistics 408 appendix B Reinforcement learning with human feedback 416 appendix C Multimodal latent spaces 420 index 427

foreword Unless you’ve been hiding in a cave, you know that LLMs are everywhere. They’re becoming a staple for many people. If you’re reading this book, there’s a good chance you’ve integrated LLMs into your workflow. But you might be wondering how to deploy LLMs in production. This is precisely why LLMs in Production is a timely and invaluable book. Drawing from their extensive experience and deep expertise in machine learning and lin- guistics, the authors offer a comprehensive guide to navigating the complexities of bringing LLMs into production environments. They don’t just explore the technical aspects of implementation; they delve into the strategic considerations, ethical implications, and best practices crucial for responsible and effective production deployments of LLMs. LLMs in Production has it all. Starting with an overview of what LLMs are, the book dives deep into language modeling, MLOps for LLMs, prompt engineering, and every relevant topic in between. You’ll come away with a bottoms-up approach to working with LLMs from first principles. This book will stand the test of time, at least as long as possible, in this fast-changing landscape. You should approach this book with an open mind and a critical eye. The future of LLMs is not predetermined—it will be shaped by the decisions we make and the care with which we implement these powerful tools in production. Let this book guide you as you navigate the exciting, challenging world of LLMs in production. —Joe Reis, Author of Fundamentals of Data Engineeringxi

preface In January of 2023, I was sitting next to a couple, and they started to discuss the latest phenomenon, ChatGPT. The husband enthusiastically discussed how excited he was about the technology. He had been spending quality time with his teenagers writing a book using it—they had already written 70 pages. The wife, however, wasn’t as thrilled, more scared. She was an English teacher and was worried about how it was going to affect her students. It was around this time the husband said something I was completely unready for: his friend had fired 100 writers at his company. My jaw dropped. His friend owned a small website where he hired freelance writers to write sarcastic, funny, and fake arti- cles. After being shown the tool, the friend took some of his article titles and asked ChatGPT to write one. What it came up with was indistinguishable from anything else on the website! Meaningless articles that lack the necessity for veracity are LLM’s bread and butter, so it made sense. It could take him minutes to write hundreds of articles, and it was all free! We have both experienced this same conversation—with minor changes—a hun- dred times over since. From groups of college students to close-knit community mem- bers, everyone is talking about AI all the time. Very few people have experienced it firsthand, outside of querying a paid API. For years, we’ve seen how it’s been affecting the translation industry. Bespoke translation is difficult to get clients for, and the rise of PEMT (Post-Edit of Machine Translation) workflows has allowed translators to charge less and do more work faster, all with a similar level of quality. We’re gunning for LLMs to do the same for many other professions.xii

PREFACE xiii When ChatGPT first came out, it was essentially still in beta release for research purposes, and OpenAI hadn’t even announced plus subscriptions yet. In our time in the industry, we have seen plenty of machine learning models put up behind an API with the release of a white paper. This helps researchers build clout so they can show off a working demo. However, these demos are just that—never built to scale and usu- ally taken down after a month for cost reasons. OpenAI had done just that on several occasions already. Having already seen the likes of BERT, ELMO, T5, GPT-2, and a host of other language models come and go without any fanfare outside the NLP community, it was clear that GPT-3 was different. LLMs aren’t just popular; they are technically very difficult. There are so many challenges and pitfalls that one can run into when trying to deploy one, and we’ve seen many make those mistakes. So when the oppor- tunity came up to write this book, we were all in. LLMs in Production is the book we always wished we had.

acknowledgments Before writing this book, we always fantasized about escaping up to the mountains and writing in the seclusion of some cabin in the forest. While that strategy might work for some authors, there’s no way we would have been able to create what we believe to be a fantastic book without the help of so many people. This book had many eyes on it throughout its entire process, and the feedback we’ve received has been fundamental to its creation. First, we’d like to thank our editors and reviewers, Jonathan Gennick, Al Krinker, Doug Rudder, Sebastian Raschka, and Danny Leybzon. Danny is a data and machine learning expert and worked as a technical editor on this book. He has helped Fortune 500 enterprises and innovative tech startups alike design and implement their data and machine learning strategies. He now does research in reinforcement learning at Universitat Pompeu Fabra in Spain. We thank all of you for your direct commentary and honest criticism. Words can’t describe the depth of our gratitude. We are also thankful for so many in the community who encouraged us to write this book. There are many who have supported us as mentors, colleagues, and friends. For their encouragement, support, and often promotion of the book, we’d like to thank in no particular order: Joe Reis, Mary MacCarthy, Lauren Balik, Demetrios Brinkman, Joselito Balleta, Mkolaj Pawlikowski, Abi Aryan, Bryan Verduzco, Fokke Dekker, Monica Kay Royal, Mariah Peterson, Eric Riddoch, Dakota Quibell, Daniel Smith, Isaac Tai, Alex King, Emma Grimes, Shane Smit, Dusty Chadwick, Sonam Choudhary, Isaac Vidas, Olivier Labrèche, Alexandre Gariépy, Amélie Rolland, Alicia Bargar, Vivian Tao, Colin Campbell, Connor Clark, Marc-Antoine Bélanger, Abhinxiv

ACKNOWLEDGMENTS xvChhabra, Sylvain Benner, Jordan Mitchell, Benjamin Wilson, Manny Ko, Ben Taylor, Matt Harrison, Jon Bradshaw, Andrew Carr, Brett Ragozzine, Yogesh Sakpal, Gauri Bhatnagar, Sachin Pandey, Vinícius Landeira, Nick Baguely, Cameron Bell, Cody Maughan, Sebastian Quintero, and Will McGinnis. This isn’t a comprehensive list, and we are sure we are forgetting someone. If that’s you, thank you. Please reach out, and we’ll be sure to correct it. Next, we are so thankful for the entire Manning team, including Aira Dučić, Robin Campbell, Melissa Ice, Ana Romac, Azra Dedic, Ozren Harlović, Dunja Nikitović, Sam Wood, Susan Honeywell, Erik Pillar, Alisa Larson, Melody Dolab, and others. To all the reviewers, Abdullah Al Imran, Allan Makura, Ananda Roy, Arunkumar Gopalan, Bill Morefield, Blanca Vargas, Bruno Sonnino, Dan Sheikh, Dinesh Chitlangia, George Geevarghese, Gregory Varghese, Harcharan S. Kabbay, Jaganadh Gopinadhan, Janardhan Shetty, Jeremy Bryan, John Williams, Jose San Leandro, Kyle Pollard, Manas Talukdar, Manish Jain, Mehmet Yilmaz, Michael Wang, Nupur Baghel, Ondrej Krajicek, Paul Silisteanu, Peter Henstock, Radhika Kanubaddhi, Reka Anna Horvath, Satej Kumar Sahu, Sergio Govoni, Simon Tschoeke, Simone De Bonis, Simone Sguazza, Siri Varma Vegiraju, Sriram Macharla, Sudhir Maharaj, Sumaira Afzal, Sumit Pal, Supriya Arun, Vinod Sangare, Xiangbo Mao, Yilun Zhang, your suggestions helped make this a better book. Lastly, we’d also like to give a special thanks to Elmer Saflor for giving us permis- sion to use the Yellow Balloon meme and George Lucas, Hayden Christensen, and Temuera Morrison for being a welcome topic of distraction during many late nights working on the book. “We want to work on Star Wars stuff.”

about the book LLMs in Production is not your typical Data Science book. In fact, you won’t find many books like this at all in the data space mainly because creating a successful data prod- uct often requires a large team—data scientists to build models, data engineers to build pipelines, MLOps engineers to build platforms, software engineers to build applications, product managers to go to endless meetings, and, of course, for each of these, managers to take the credit for it all despite their only contribution being to ask questions, oftentimes the same questions repeated, just trying to understand what’s going on. There are so many books geared toward each of these individuals, but there are so very few that tie the entire process together from end to end. While this book focuses on LLMs—indeed, it can be considered an LLMOps book—what you will take away will be so much more than how to push a large model onto a server. You will gain a roadmap that will show you how to create successful ML products—LLMs or other- wise—that delight end users. Who should read this book Anyone who finds themselves working on an application that uses LLMs will benefit from this book. This includes all of the previously listed individuals. The individuals who will benefit the most, though, will likely be those who have cross-functional roles with titles like ML engineer. This book is hands-on, and we expect our readers to know Python and, in particular, PyTorch. xvi

ABOUT THE BOOK xviiHow this book is organized There are 12 chapters in this book, 3 of which are project chapters:  Chapter 1 presents some of the promising applications of LLMs and discusses the build-versus-buy dichotomy. This book’s focus is showing you how to build, so we want to help you determine whether building is the right decision for you.  Chapter 2 lays the necessary groundwork. We discuss the basics of linguistics and define some terms you’ll need to understand to get the most out of this book. We then build your knowledge of natural language modeling techniques. By the end of this chapter, you should both understand how LLMs work and what they are good or bad at. You should then be able to determine whether LLMs are the right technology for your project.  Chapter 3 addresses the elephant in the room by explaining why LLMs are so dif- ficult to work with. We’ll then discuss some necessary concepts and solutions you’ll need to master just to start working with LLMs. Then we’ll discuss the nec- essary tooling and infrastructure requirements you’ll want to acquire and why.  Chapter 4 starts our preparations by discussing the necessary assets you’ll need to acquire, from data to foundation models.  Chapter 5 then shows you how to train an LLM from scratch as well as a myriad of methods to finetune your model, going over the pros and cons of each method.  Chapter 6 then dives into serving LLMs and what you’ll need to know to create an API. It discusses setting up a VPC for LLMs as well as common production challenges and how to overcome them.  Chapter 7 discusses prompt engineering and how to get the most out of an LLM’s responses.  Chapter 8 examines building an application around an LLM and features you’ll want to consider adding to improve the user experience.  Chapter 9 is the first of our project chapters, where you will build a simple LLama 3 model and deploy it.  Chapter 10 builds a coding copilot that you can use directly in VSCode.  Chapter 11 is a project where we will deploy an LLM to a Raspberry Pi.  Chapter 12 ends the book with our thoughts on the future of LLMs as a tech- nology, including discussions of promising fields of research. In general, this book was designed to be read cover to cover, each chapter building upon the last. To us, the chapters are ordered to mock an ideal situation and thus out- line the knowledge you’ll need and the steps you would go through when building an LLM product under the best circumstances. That said, this is a production book, and production is where reality lives. Don’t worry; we understand the real world is messy. Each chapter is self-contained, and readers are free and encouraged to jump around depending on their interests and levels of understanding.

Statistics

Uploader

LLMs in Production From language models to successful products (Christopher Brousseau, Matthew Sharp) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Recommended for You

Statistics

Uploader

LLMs in Production From language models to successful products (Christopher Brousseau, Matthew Sharp) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Reply to Comment

Edit Comment

Recommended for You