How Large Language Models Work (Edward Raff, Drew Farris, Stella Biderman) (Z-Library)

M A N N I N G Edward Raff ● Drew Farris ● Stella Biderman FOR Booz Allen Hamilton How Large Language Models Work

Model input Input tokens Input sequences (Repeated many times) Document Output tokens Output sequences The transformer model Split document into sequences Convert to tokens Sampling 2 4 5 6 Output text Decoding 7 Model output 3 Word embedding Transformer layer Transformer layer Unembedding 1 Positional embedding The process for converting input into output using a large language model The following items are references to where each step is explained in detail: 1 Map text to tokens (chapter 2). 2 Map tokens into embedding space (subsection 3.2.1). 3 Add information to each embedding that captures each token’s position in the input text (figure 3.7). 4 Pass the data through a transformer layer (repeat L times) (subsection 3.2.2). 5 Apply the unembedding layer to get tokens that could make good responses (subsection 3.2.3). 6 Sample from the list of possible tokens to generate a single response (figure 3.11). 7 Decode tokens from the response into actual text (subsection 3.2.3).

How Large Language Models Work

(This page has no text content)

How Large Language Models Work Edward Raff Drew Farris Stella Biderman M A N N I N G SHELTER ISLAND

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com © 2025 Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. ∞ Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. The author and publisher have made every effort to ensure that the information in this book was correct at press time. The author and publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause, or from any usage of the information herein. Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Development editor: Frances Lefkowitz Technical editor: Shreesha Jagadeesh Review editors: Aleksandar Dragosavljevic, Radmila Ercegovac Production editor: Andy Marinkovich Copy editor: Alisa Larson Proofreader: Melody Dolab Typesetter: Ammar Taha Mohamedy Cover designer: Marija Tudor ISBN 9781633437081 Printed in the United States of America

brief contents 1 Big picture: What are LLMs? 1 2 Tokenizers: How large languagemodels see the world 14 3 Transformers: How inputs become outputs 29 4 How LLMs learn 46 5 How do we constrain the behavior of LLMs? 65 6 Beyond natural language processing 88 7 Misconceptions, limits, and eminent abilities of LLMs 107 8 Designing solutions with large languagemodels 125 9 Ethics of building and using LLMs 140 v

contents preface x acknowledgments xii about this book xiv about the authors xvii about the cover illustration xix 1 Big picture: What are LLMs? 1 1.1 Generative AI in context 2 1.2 What you will learn 5 1.3 Introducing how LLMs work 6 1.4 What is intelligence, anyway? 7 1.5 How humans and machines represent language differently 9 1.6 Generative Pretrained Transformers and friends 10 1.7 Why LLMs perform so well 10 1.8 LLMs in action: The good, bad, and scary 12 2 Tokenizers: How large language models see the world 14 2.1 Tokens as numeric representations 15 2.2 Language models see only tokens 15 The tokenization process 16 Controlling vocabulary size in tokenization 18 Tokenization in detail 20 The risks of tokenization 22 vi

CONTENTS vii 2.3 Tokenization andLLMcapabilities 24 LLMs are bad at word games 25 LLMs are challenged by mathematics 26 LLMs and language equity 26 2.4 Check your understanding 27 2.5 Tokenization in context 27 3 Transformers: How inputs become outputs 29 3.1 The transformer model 30 Layers of the transformer model 31 3.2 Exploring the transformer architecture in detail 33 Embedding layers 34 Transformer layers 38 Unembedding layers 40 3.3 The tradeoff between creativity and topical responses 43 3.4 Transformers in context 44 4 How LLMs learn 46 4.1 Gradient descent 47 What is a loss function? 47 What is gradient descent? 51 4.2 LLMs learn to mimic human text 54 LLM reward functions 55 4.3 LLMs and novel tasks 58 Failing to identify the correct task 60 LLMs cannot plan 61 4.4 If LLMs cannot extrapolate well, can I use them? 62 4.5 Is bigger better? 63 5 Howdowe constrain the behavior of LLMs? 65 5.1 Why dowewant to constrain behavior? 66 Base models are not very usable 68 Not all model outputs are desirable 69 Some cases require specific formatting 70 5.2 Fine-tuning: The primary method of changing behavior 70 Supervised fine-tuning 71 Reinforcement learning from human feedback 73 Fine-tuning: The big picture 74 5.3 The mechanics of RLHF 75 Beginning with a naive RLHF 75 The quality reward model 76 The similar-but-different RLHF objective 77 5.4 Other factors in customizing LLMbehavior 79 Altering training data 79 Altering base model training 80 Altering the outputs 81 5.5 Integrating LLMs into larger workflows 82 Customizing LLMs with retrieval augmented generation 82 General purpose LLM programming 84

viii CONTENTS 6 Beyond natural language processing 88 6.1 LLMs for software development 90 Improving LLMs to work with code 92 Validating code generated by LLMs 93 Improving code via formatting 94 6.2 LLMs for formal mathematics 95 Sanitized input 96 Helping LLMs understand numbers 97 Math LLMs also use tools 99 6.3 Transformers and computer vision 101 Converting images to patches and back 101 Multimodal models using images and text 104 Applicability of prior lessons 105 7 Misconceptions, limits, and eminent abilities of LLMs 107 7.1 Human rate of learning vs. LLMs 108 The limitations on self-improvement 111 Few-shot learning 114 7.2 Efficiency of work: A 10-watt human brain vs. a 2000-watt computer 115 Power 115 Latency, scalability, and availability 116 Refinement 117 7.3 Language models are not models of the world 117 7.4 Computational limits: Hard problems are still hard 120 Using fuzzy algorithms for fuzzy problems 122 When close enough is good enough for hard problems 122 8 Designing solutions with large languagemodels 125 8.1 Just make a chatbot? 126 8.2 Automation bias 128 Changing the process 130 When things are too risky for autonomous LLMs 130 8.3 Usingmore than LLMs to reduce risk 132 Combining LLM embeddings with other tools 132 Designing a solution that uses embeddings 134 8.4 Technology presentationmatters 136 How can you be transparent? 137 Aligning incentives with users 138 Incorporating feedback cycles 138 9 Ethics of building and using LLMs 140 9.1 Why did we build LLMs at all? 141 The pros and cons of LLMs doing everything 142 Do we want to automate all human work? 144

CONTENTS ix 9.2 Do LLMs pose an existential risk? 146 Self-improvement and the iterative S-curve 149 The alignment problem 150 9.3 The ethics of data sourcing and reuse 152 What is fair use? 153 The challenges associated with compensating content creators 154 The limitations of public domain data 155 9.4 Ethical concerns with LLMoutputs 156 Licensing implications for LLM output 157 Do LLM outputs poison the well? 158 9.5 Other explorations in LLMethics 160 References 163 index 169

preface The skeleton of this book began to come together in the late 2010s when we saw several significant advancements in the field of artificial intelligence (AI) that we knew could soon lead to a breakthrough. The convergence of new types of computer hardware, the availability of vast amounts of data, and the growth of neural networks were rapidly converging to a tipping point where it was now possible for machine learning algorithms to accurately capture nuances of language and meaning at a surprising level of fidelity. With the right combination of breakthroughs, we knew this would enable an entirely new class of applications. We conducted research, built prototypes, had conversations with our colleagues, clients, and families, and sought to tell the story of how these advancements could change the world and the underlying techniques that made that possible. Then, at the end of November 2022, OpenAI released ChatGPT, and suddenly, this potential became a reality. By putting this technology into the hands of the public, anyone could gain firsthand experience by interacting with a chatbot powered by a large language model (LLM). As with any new technology, there was a lot of speculation as to what could possibly allow ChatGPT to interact with great fidelity and produce such high-quality output. We saw that, based on interactions with ChatGPT, people often assumed that there was more behind the curtain than truly existed, sometimes believing that we were truly on the cusp of general AI that could do anything. We found that our conversations shifted to what could practically be achieved using applications of LLMs, managing expectations, characterizing risks, validating behaviors, and negotiating the path between what’s realistic and what’s not safe or responsible to attempt. Fast forward to 2025, and we’re now firmly ensconced in the era of generative and agentic AI. We have seen a massive proliferation of models, applications, and x

PREFACE xi capabilities and an explosion in the types of data we can work with. Eachmajor vendor has a technology offering that incorporates an LLM, whether they are chatbots to talk to or agents that review our writing, help us write computer programs, or generate images. Many of these are controversial, leading to new conversations about data use and causing us to rethink our assumptions about the relationship between technology and creativity. Regardless, there are core principles that enable these applications, and our goal with this book is to describe these in a way that’s accessible to readers from all walks of life. Whether you’re a CEO, a machine learning engineer, a casual coder, or just the average person seeking to use this technology, we hope you’ll find something useful in this book that explains the algorithms and techniques that make LLMs work. It is a collection of our experiences working in the field of natural language processing, machine learning, and algorithmic research, where we set out to share our knowledge in a manner that is accessible to nearly everyone. Along the way, we will dispel some of the mystery, explain the limitations, and explore the implications of this fascinating new technology. We hope you’ll join us on this voyage.

acknowledgments This book would not be possible without the support of many of our colleagues, collaborators, and countless researchers in the field of artificial intelligence who have chosen to share their explorations of this technology. We thank our colleagues at Booz Allen Hamilton for their support of this work, including John Larson, Steve Escaravage, Justin Neroda, Catherine Ordun, Jessica Reinhart, and Katrina Jacobs. Andre Nguyen and Matt Keating deserve special recog- nition for the many conversations on the nature of large language models and ways to think about their safety. We also want to thank the outstanding staff at Manning Publications, including Frances Lefkowitz, our development editor, and Shreesha Jagadeesh, our technical editor, who both asked the hard questions and shaped and improved the book in so many ways by providing thoughtful feedback. We also thank Andy Waldron, our acquisitions editor; Rebecca Rinehart, our development manager; and Aira Ducic, who led marketing for this book. We also acknowledge Melissa Ice and Radmila Ercegovac, who orchestrated the reviews throughout the writing process, and all of the anonymous reviewers who provided excellent feedback to make this book what it is today. We owe a special debt to everyone who shepherded this book through the pro- duction effort with much patience, including Aleksandar Dragosavljevic and Andy Marinkovich, as well as Alisa Larson for editing and Melody Dolab for the final proof- read. Sam Wood and Marija Tudor led the production of our cover, and Azra Dedic led the production of our graphics and figures. To all the reviewers: Abdullah Al Imran, Adrian M. Rossi, Allan Makura, Ankit Virmani, Bhagvan Kommadi, Cristina-Ioana Casapu, David Cronkite, David Yakobo- vitch, Doug Puenner, Doyle Turner, Emanuele Piccinelli, Federico Grasso, Florian xii

ACKNOWLEDGMENTS xiii Braun, Georg Sommer, George Onofrei, Girish Ahankari, Harsh Ranjan, Holger Voges, Ivan A. Fernandez, JaganadhGopinadhan, Jeremy Zeidner, John R. Donoghue, John Guthrie, Jose Morales, Kartik Dutta, Kelvin Chappell, Louis Luangkesorn, Mark Graham, Mattia Zoccarato, Matt Sarmiento, Mikael Dautrey, Mike Taylor, Mostofa Adib Shakib, Neeraj Gupta, Oliver Korten, Raj G, Sashank Dara, Simeon Leyzerzon, Simone Sguazza, Slavomir Furman, Sudharshan Tumkunta, Tony Holdroyd, Vincent Joseph, and Walter Alexander Mata López, your suggestions helped make this a better book. We also want to thank Al Krinker, a former colleague at Booz Allen Hamilton and our first editor at Manning Publications, who helped us get started in the early days of this work. Finally, and most importantly, we want to thank our families and friends who supported and encouraged us through many nights and weekends working on this book.

about this book How Large Language Models Work is the culmination of countless hours of research, explorations, conversations, and building and evaluating large language models and the systems that use them to solve problems. It is a distillation of years of working in the fields of machine learning, natural language processing, and software engineering that we, the authors, bring to the table. It’s important to us to share what we’ve learned and break down the complexities of the field into a straightforward conversation that presents foundational details on how LLMs work and builds from there to cover topics that are not widely understood. We seek to dispel some myths and shed light on the realities along the way. This book does not describe how to implement LLMs like ChatGPT using code. Instead, it covers the foundational concepts that make LLMs operate, as well as the opportunities and limitations of this technology. We’ll provide you with an understanding of how the underlying algorithms operate. As a result, you’ll better understand why LLMs are implemented the way they are and how LLMs can be used to solve a variety of problems. Our goal is to translate years of LLM research into something understandable for someone new to the field. To do this, we’ll start with the basics to build a foundational understanding of the inner workings of LLMs and then transition to more advanced topics, including adjacent considerations that go beyond LLM operation. Along the way, we’ll tackle misconceptions, limitations, and the ethical implications of building and using LLMs, as well as the many ways LLMs can be effectively deployed as technical solutions for challenging problems. xiv

ABOUT THIS BOOK xv Who should read this book? This book is intended for a variety of readers, including those who have just started working with LLMs, experienced software developers, and data scientists, as well as technical leadership, decision makers, and executives in the C-suite, who face the challenge of developing strategies for incorporating LLMs and generative AI into their businesses. Our goal in writing this book was to create a work that is both accessible and compelling for a broad audience, presenting LLMs in a nontrivial manner. Perhaps you’ve previously encountered machine learning, either as a student or hobbyist who took an introduction to machine learning course, but you lack a strong foundation in the field. Perhaps you’re someone who has used tools like OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, or Microsoft’s Copilot for work or play and are curious about how these tools generate their results. Regardless of your background or experience, we believe there’s something for you in this book. Once you’re done, you’ll know How LLMs process human language data and identify the tasks that may fail when using an LLM How data flows through an LLM, the role of transformers and attention, how they operate at a high level, why they are important, and how they relate to other machine learning algorithms How LLMs are trained on data, including the concepts of parameters, gradient descent, pretraining, and why model size is critical How to choose a deployment strategy for LLMs in your applications and business How to identify tasks and scenarios that LLMs can’t realistically solve The dangers and ethical concerns of using and building LLMs and where it is appropriate or inappropriate to use them How this book is organized: A roadmap In this book, we’ll start with the basics of how LLMs process human language, the algorithms that make them possible, and how they learn from data. From there, we’ll explore how LLM technology can be applied to tasks beyond text and wrap up with a discussion of LLMs’ use and the implications of this technology. Chapter 1 provides a high-level understanding of LLMs and generative AI in plain language. We explore the differences between how humans and machines work with language and begin to peel back the surface of what makes LLMs so capable, introducing their limitations and potential concerns when using them. Chapters 2 to 5 delve deep into what’s going on under the hood, focusing on the mechanics rather than the math. In chapter 2, we explain how large language models process text so that they can work with it before diving into the internals of how the things we enter into an LLM ultimately lead to the generative output they produce in chapter 3. Chapter 4 discusses how all of this is possible, the process of training an

xvi ABOUT THIS BOOK LLM on incredible amounts of text, and why this training can fail to produce the expected outcomes. Chapter 5 describes how we can control and constrain an LLM and its outputs for specific applications. Chapter 6 looks beyond working with languages and explores the use of LLMs for software development, formal mathematics, and beyond, including text, images, audio, and video. Now that we’ve covered the mechanics, chapters 7 to 9 introduce the conside- rations behind using LLMs in real-world applications. First, we tackle many of the misconceptions, limits, and capabilities of LLMs in chapter 7. In chapter 8, we discuss different scenarios for designing solutions that use LLMs and identify situations where the obvious choices may not be the best options. Any discussion of LLM use wouldn’t be complete without covering the ethical implications of building and using LLMs, which we cover in chapter 9. Do LLMs pose an existential risk to humanity? What are the ethics and implications of training on as much data as we can scrape from the internet? Join us on this journey, and you’ll discover that along the way, you’ve become equipped with the knowledge you need for critical thinking about this compelling new technology. Throughout the book, you’ll find many references to other sources of information that go deeper into different aspects of LLMs that we cover. We collect all of these in a references section at the end of the book, providing easy access to the entire list of resources in one place. We encourage you to continue exploring LLMs by visiting these sources and delving deeper into topics that best align with your interests. liveBook discussion forum Purchase of How Large Language Models Work includes free access to liveBook, Man- ning’s online reading platform. Using liveBook’s exclusive discussion features, you can attach comments to the book globally or to specific sections or paragraphs. It’s easy to make notes for yourself, ask and answer questions, and receive help from the authors and other users. To access the forum, go to https://livebook.manning.com/ book/how-large-language-models-work/discussion. Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the authors can take place. It’s not a commitment to any specific amount of participation on the part of the authors, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the authors some challenging questions lest their interest stray! The forum and archives of the previous discussions will be accessible from the publisher’s website as long as the book is in print.

about the authors EdwardRaff is a Director of Emerging AI at Booz Allen Hamilton, where he leads the machine learning research team. He has worked in healthcare, natural language processing, computer vision, and cybersecurity, as well as fundamental AI/ML research. The author of Inside Deep Learning, Dr. Raff has over 100 published research articles at the top artificial intelligence conferences. He is the author of the Java Statistical Analysis Tool library, a Senior Member of the Association for the Advancement of Artificial Intelligence, and twice chaired the Conference on Applied Machine Learning and Information Technology and the AI for Cyber Security workshop. Dr. Raff’s work has been deployed and used by antivirus companies worldwide. Drew Farris is a Principal at Booz Allen Hamilton. He specializes in artificial intelligence andmachine learning, with over 14 years of experience building advanced ana- lytics for public sector clients. Before joining Booz Allen, Drew worked with academic research teams and start- ups on information retrieval, natural language process- ing, and large-scale data management platforms. He has co-authored several publications, including Booz Allen’s Field Guide to Data Science and Machine Intelligence Pri- mer, and the Jolt Award-winning book Taming Text on computational text proc- essing. Drew is also a member of the Apache Software Foundation and has contribut- ed to open source projects like Apache Mahout, Lucene, and Solr. xvii

xviii ABOUT THE AUTHORS Stella Biderman is a machine learning researcher at Booz Allen Hamilton and the executive director of the nonprofit research center EleutherAI. She is a leading advocate for open source artificial intelligence and has trained many of the world’s most powerful open source artificial intelligence algorithms. She has a master’s degree in computer science from the Georgia Institute of Technology and degrees in Mathematics and Philosophy from the University of Chicago.

Statistics

Uploader

How Large Language Models Work (Edward Raff, Drew Farris, Stella Biderman) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Statistics

Uploader

How Large Language Models Work (Edward Raff, Drew Farris, Stella Biderman) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Reply to Comment

Edit Comment