📄 Page
1
Tunsta ll, von W erra & W olf Lewis Tunstall, Leandro von Werra & Thomas Wolf Natural Language Processing with Transformers Building Language Applications with Hugging Face Revised Edition
📄 Page
2
MACHINE LE ARNING “The preeminent book for the preeminent transformers library— a model of clarity!” —Jeremy Howard Cofounder of fast.ai and professor at University of Queensland “A wonderfully clear and incisive guide to modern NLP’s most essential library. Recommended!” —Christopher Manning Thomas M. Siebel Professor in Machine Learning, Stanford University Natural Language Processing with Transformers ISBN: 978-1-098-10324-8 US $59.99 CAN $79.99 Twitter: @oreillymedia linkedin.com/company/oreilly-media youtube.com/oreillymedia Since their introduction in 2017, transformers have quickly become the dominant architecture for achieving state-of-the- art results on a variety of natural language processing tasks. If you’re a data scientist or coder, this practical book—now revised in full color—shows you how to train and scale these large models using Hugging Face Transformers, a Python- based deep learning library. Transformers have been used to write realistic news stories, improve Google Search queries, and even create chatbots that tell corny jokes. In this guide, authors Lewis Tunstall, Leandro von Werra, and Thomas Wolf, among the creators of Hugging Face Transformers, use a hands-on approach to teach you how transformers work and how to integrate them in your applications. You’ll quickly learn a variety of tasks they can help you solve. • Build, debug, and optimize transformer models for core NLP tasks, such as text classification, named entity recognition, and question answering • Learn how transformers can be used for cross-lingual transfer learning • Apply transformers in real-world scenarios where labeled data is scarce • Make transformer models efficient for deployment using techniques such as distillation, pruning, and quantization • Train transformers from scratch and learn how to scale to multiple GPUs and distributed environments Lewis Tunstall is a machine learning engineer at Hugging Face. His current work focuses on developing tools for the NLP community and teaching people to use them effectively. Leandro von Werra is a machine learning engineer in the open source team at Hugging Face, where he primarily works on code generation models and community outreach. Thomas Wolf is chief science officer and cofounder of Hugging Face. His team is on a mission to catalyze and democratize AI research. Tunsta ll, von W erra & W olf SBN: 978-1-098-13679-6 59 $74.
📄 Page
3
Praise for Natural Language Processing with Transformers Pretrained transformer language models have taken the NLP world by storm, while libraries such as Transformers have made them much easier to use. Who better to teach you how to leverage the latest breakthroughs in NLP than the creators of said library? Natural Language Processing with Transformers is a tour de force, reflecting the deep subject matter expertise of its authors in both engineering and research. It is the rare book that offers both substantial breadth and depth of insight and deftly mixes research advances with real-world applications in an accessible way. The book gives informed coverage of the most important methods and applications in current NLP, from multilingual to efficient models and from question answering to text generation. Each chapter provides a nuanced overview grounded in rich code examples that highlights best practices as well as practical considerations and enables you to put research-focused models to impactful real-world use. Whether you’re new to NLP or a veteran, this book will improve your understanding and fast-track your development and deployment of state-of-the-art models. —Sebastian Ruder, Google DeepMind Transformers have changed how we do NLP, and Hugging Face has pioneered how we use transformers in product and research. Lewis Tunstall, Leandro von Werra, and Thomas Wolf from Hugging Face have written a timely volume providing a convenient and hands-on introduction to this critical topic. The book offers a solid conceptual grounding of transformer mechanics, a tour of the transformer menagerie, applications of transformers, and practical issues in training and bringing transformers to production. Having read chapters in this book, with the depth of its content and lucid presentation, I am confident that this will be the number one resource for anyone interested in learning transformers, particularly for natural language processing. —Delip Rao, Author of Natural Language Processing and Deep Learning with PyTorch
📄 Page
4
Complexity made simple. This is a rare and precious book about NLP, transformers, and the growing ecosystem around them, Hugging Face. Whether these are still buzzwords to you or you already have a solid grasp of it all, the authors will navigate you with humor, scientific rigor, and plenty of code examples into the deepest secrets of the coolest technology around. From “off-the-shelf pretrained” to “from-scratch custom” models, and from performance to missing labels issues, the authors address practically every real- life struggle of a ML engineer and provide state-of-the-art solutions, making this book destined to dictate the standards in the field for years to come. —Luca Perrozzi, PhD, Data Science and Machine Learning Associate Manager at Accenture
📄 Page
5
Lewis Tunstall, Leandro von Werra, and Thomas Wolf Foreword by Aurélien Géron Natural Language Processing with Transformers Building Language Applications with Hugging Face REVISED EDITION Boston Farnham Sebastopol TokyoBeijing
📄 Page
6
978-1-098-13679-6 [LSI] Natural Language Processing with Transformers by Lewis Tunstall, Leandro von Werra, and Thomas Wolf Copyright © 2022 Lewis Tunstall, Leandro von Werra, and Thomas Wolf. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Rebecca Novack Development Editor: Melissa Potter Production Editor: Katherine Tozer Copyeditor: Rachel Head Proofreader: Kim Cofer Indexer: Potomac Indexing, LLC Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Christa Lanz February 2022: First Edition May 2022: Revised Color Edition Revision History for the Revised Edition 2022-05-27: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098136796 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Natural Language Processing with Transformers, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
📄 Page
7
Table of Contents Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv 1. Hello Transformers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 The Encoder-Decoder Framework 2 Attention Mechanisms 4 Transfer Learning in NLP 6 Hugging Face Transformers: Bridging the Gap 9 A Tour of Transformer Applications 10 Text Classification 10 Named Entity Recognition 11 Question Answering 12 Summarization 13 Translation 13 Text Generation 14 The Hugging Face Ecosystem 15 The Hugging Face Hub 16 Hugging Face Tokenizers 17 Hugging Face Datasets 18 Hugging Face Accelerate 18 Main Challenges with Transformers 19 Conclusion 20 2. Text Classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 The Dataset 22 A First Look at Hugging Face Datasets 23 From Datasets to DataFrames 26 v
📄 Page
8
Looking at the Class Distribution 27 How Long Are Our Tweets? 28 From Text to Tokens 29 Character Tokenization 29 Word Tokenization 31 Subword Tokenization 33 Tokenizing the Whole Dataset 35 Training a Text Classifier 36 Transformers as Feature Extractors 38 Fine-Tuning Transformers 45 Conclusion 54 3. Transformer Anatomy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 The Transformer Architecture 57 The Encoder 60 Self-Attention 61 The Feed-Forward Layer 70 Adding Layer Normalization 71 Positional Embeddings 73 Adding a Classification Head 75 The Decoder 76 Meet the Transformers 78 The Transformer Tree of Life 78 The Encoder Branch 79 The Decoder Branch 82 The Encoder-Decoder Branch 83 Conclusion 84 4. Multilingual Named Entity Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 The Dataset 88 Multilingual Transformers 92 A Closer Look at Tokenization 93 The Tokenizer Pipeline 94 The SentencePiece Tokenizer 95 Transformers for Named Entity Recognition 96 The Anatomy of the Transformers Model Class 98 Bodies and Heads 98 Creating a Custom Model for Token Classification 99 Loading a Custom Model 101 Tokenizing Texts for NER 103 Performance Measures 105 Fine-Tuning XLM-RoBERTa 106 vi | Table of Contents
📄 Page
9
Error Analysis 108 Cross-Lingual Transfer 115 When Does Zero-Shot Transfer Make Sense? 116 Fine-Tuning on Multiple Languages at Once 118 Interacting with Model Widgets 121 Conclusion 121 5. Text Generation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 The Challenge with Generating Coherent Text 125 Greedy Search Decoding 127 Beam Search Decoding 130 Sampling Methods 134 Top-k and Nucleus Sampling 136 Which Decoding Method Is Best? 140 Conclusion 140 6. Summarization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 The CNN/DailyMail Dataset 141 Text Summarization Pipelines 143 Summarization Baseline 143 GPT-2 144 T5 144 BART 145 PEGASUS 145 Comparing Different Summaries 146 Measuring the Quality of Generated Text 148 BLEU 148 ROUGE 152 Evaluating PEGASUS on the CNN/DailyMail Dataset 154 Training a Summarization Model 157 Evaluating PEGASUS on SAMSum 158 Fine-Tuning PEGASUS 158 Generating Dialogue Summaries 162 Conclusion 163 7. Question Answering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Building a Review-Based QA System 166 The Dataset 167 Extracting Answers from Text 173 Using Haystack to Build a QA Pipeline 181 Improving Our QA Pipeline 189 Evaluating the Retriever 189 Table of Contents | vii
📄 Page
10
Evaluating the Reader 196 Domain Adaptation 199 Evaluating the Whole QA Pipeline 203 Going Beyond Extractive QA 205 Conclusion 207 8. Making Transformers Efficient in Production. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Intent Detection as a Case Study 210 Creating a Performance Benchmark 212 Making Models Smaller via Knowledge Distillation 217 Knowledge Distillation for Fine-Tuning 217 Knowledge Distillation for Pretraining 220 Creating a Knowledge Distillation Trainer 220 Choosing a Good Student Initialization 222 Finding Good Hyperparameters with Optuna 226 Benchmarking Our Distilled Model 229 Making Models Faster with Quantization 230 Benchmarking Our Quantized Model 236 Optimizing Inference with ONNX and the ONNX Runtime 237 Making Models Sparser with Weight Pruning 243 Sparsity in Deep Neural Networks 244 Weight Pruning Methods 244 Conclusion 248 9. Dealing with Few to No Labels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Building a GitHub Issues Tagger 251 Getting the Data 252 Preparing the Data 253 Creating Training Sets 257 Creating Training Slices 259 Implementing a Naive Bayesline 260 Working with No Labeled Data 263 Working with a Few Labels 271 Data Augmentation 271 Using Embeddings as a Lookup Table 275 Fine-Tuning a Vanilla Transformer 284 In-Context and Few-Shot Learning with Prompts 288 Leveraging Unlabeled Data 289 Fine-Tuning a Language Model 289 Fine-Tuning a Classifier 293 Advanced Methods 295 Conclusion 297 viii | Table of Contents
📄 Page
11
10. Training Transformers from Scratch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Large Datasets and Where to Find Them 300 Challenges of Building a Large-Scale Corpus 300 Building a Custom Code Dataset 303 Working with Large Datasets 306 Adding Datasets to the Hugging Face Hub 309 Building a Tokenizer 310 The Tokenizer Model 312 Measuring Tokenizer Performance 312 A Tokenizer for Python 313 Training a Tokenizer 318 Saving a Custom Tokenizer on the Hub 322 Training a Model from Scratch 323 A Tale of Pretraining Objectives 323 Initializing the Model 325 Implementing the Dataloader 326 Defining the Training Loop 330 The Training Run 337 Results and Analysis 338 Conclusion 343 11. Future Directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Scaling Transformers 345 Scaling Laws 347 Challenges with Scaling 349 Attention Please! 351 Sparse Attention 352 Linearized Attention 353 Going Beyond Text 354 Vision 355 Tables 359 Multimodal Transformers 361 Speech-to-Text 361 Vision and Text 364 Where to from Here? 370 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Table of Contents | ix
📄 Page
12
(This page has no text content)
📄 Page
13
1 For brain hygiene tips, see CGP Grey’s excellent video on memes. Foreword A miracle is taking place as you read these lines: the squiggles on this page are trans‐ forming into words and concepts and emotions as they navigate their way through your cortex. My thoughts from November 2021 have now successfully invaded your brain. If they manage to catch your attention and survive long enough in this harsh and highly competitive environment, they may have a chance to reproduce again as you share these thoughts with others. Thanks to language, thoughts have become air‐ borne and highly contagious brain germs—and no vaccine is coming. Luckily, most brain germs are harmless,1 and a few are wonderfully useful. In fact, humanity’s brain germs constitute two of our most precious treasures: knowledge and culture. Much as we can’t digest properly without healthy gut bacteria, we cannot think properly without healthy brain germs. Most of your thoughts are not actually yours: they arose and grew and evolved in many other brains before they infected you. So if we want to build intelligent machines, we will need to find a way to infect them too. The good news is that another miracle has been unfolding over the last few years: sev‐ eral breakthroughs in deep learning have given birth to powerful language models. Since you are reading this book, you have probably seen some astonishing demos of these language models, such as GPT-3, which given a short prompt such as “a frog meets a crocodile” can write a whole story. Although it’s not quite Shakespeare yet, it’s sometimes hard to believe that these texts were written by an artificial neural net‐ work. In fact, GitHub’s Copilot system is helping me write these lines: you’ll never know how much I really wrote. The revolution goes far beyond text generation. It encompasses the whole realm of natural language processing (NLP), from text classification to summarization, trans‐ lation, question answering, chatbots, natural language understanding (NLU), and xi
📄 Page
14
more. Wherever there’s language, speech or text, there’s an application for NLP. You can already ask your phone for tomorrow’s weather, or chat with a virtual help desk assistant to troubleshoot a problem, or get meaningful results from search engines that seem to truly understand your query. But the technology is so new that the best is probably yet to come. Like most advances in science, this recent revolution in NLP rests upon the hard work of hundreds of unsung heroes. But three key ingredients of its success do stand out: • The transformer is a neural network architecture proposed in 2017 in a ground‐ breaking paper called “Attention Is All You Need”, published by a team of Google researchers. In just a few years it swept across the field, crushing previous archi‐ tectures that were typically based on recurrent neural networks (RNNs). The Transformer architecture is excellent at capturing patterns in long sequences of data and dealing with huge datasets—so much so that its use is now extending well beyond NLP, for example to image processing tasks. • In most projects, you won’t have access to a huge dataset to train a model from scratch. Luckily, it’s often possible to download a model that was pretrained on a generic dataset: all you need to do then is fine-tune it on your own (much smaller) dataset. Pretraining has been mainstream in image processing since the early 2010s, but in NLP it was restricted to contextless word embeddings (i.e., dense vector representations of individual words). For example, the word “bear” had the same pretrained embedding in “teddy bear” and in “to bear.” Then, in 2018, several papers proposed full-blown language models that could be pre‐ trained and fine-tuned for a variety of NLP tasks; this completely changed the game. • Model hubs like Hugging Face’s have also been a game-changer. In the early days, pretrained models were just posted anywhere, so it wasn’t easy to find what you needed. Murphy’s law guaranteed that PyTorch users would only find Tensor‐ Flow models, and vice versa. And when you did find a model, figuring out how to fine-tune it wasn’t always easy. This is where Hugging Face’s Transformers library comes in: it’s open source, it supports both TensorFlow and PyTorch, and it makes it easy to download a state-of-the-art pretrained model from the Hug‐ ging Face Hub, configure it for your task, fine-tune it on your dataset, and evalu‐ ate it. Use of the library is growing quickly: in Q4 2021 it was used by over five thousand organizations and was installed using pip over four million times per month. Moreover, the library and its ecosystem are expanding beyond NLP: image processing models are available too. You can also download numerous datasets from the Hub to train or evaluate your models. So what more can you ask for? Well, this book! It was written by open source devel‐ opers at Hugging Face—including the creator of the Transformers library!—and it xii | Foreword
📄 Page
15
shows: the breadth and depth of the information you will find in these pages is astounding. It covers everything from the Transformer architecture itself, to the Transformers library and the entire ecosystem around it. I particularly appreciated the hands-on approach: you can follow along in Jupyter notebooks, and all the code examples are straight to the point and simple to understand. The authors have exten‐ sive experience in training very large transformer models, and they provide a wealth of tips and tricks for getting everything to work efficiently. Last but not least, their writing style is direct and lively: it reads like a novel. In short, I thoroughly enjoyed this book, and I’m certain you will too. Anyone inter‐ ested in building products with state-of-the-art language-processing features needs to read it. It’s packed to the brim with all the right brain germs! — Aurélien Géron November 2021, Auckland, NZ Foreword | xiii
📄 Page
16
(This page has no text content)
📄 Page
17
1 NLP researchers tend to name their creations after characters in Sesame Street. We’ll explain what all these acronyms mean in Chapter 1. Preface Since their introduction in 2017, transformers have become the de facto standard for tackling a wide range of natural language processing (NLP) tasks in both academia and industry. Without noticing it, you probably interacted with a transformer today: Google now uses BERT to enhance its search engine by better understanding users’ search queries. Similarly, the GPT family of models from OpenAI have repeatedly made headlines in mainstream media for their ability to generate human-like text and images.1 These transformers now power applications like GitHub’s Copilot, which, as shown in Figure P-1, can convert a comment into source code that automatically cre‐ ates a neural network for you! So what is it about transformers that changed the field almost overnight? Like many great scientific breakthroughs, it was the synthesis of several ideas, like attention, transfer learning, and scaling up neural networks, that were percolating in the research community at the time. But however useful it is, to gain traction in industry any fancy new method needs tools to make it accessible. The Transformers library and its surrounding ecosys‐ tem answered that call by making it easy for practitioners to use, train, and share models. This greatly accelerated the adoption of transformers, and the library is now used by over five thousand organizations. Throughout this book we’ll guide you on how to train and optimize these models for practical applications. xv
📄 Page
18
Figure P-1. An example from GitHub Copilot where, given a brief description of the task, the application provides a suggestion for the entire class (everything following class is autogenerated) Who Is This Book For? This book is written for data scientists and machine learning engineers who may have heard about the recent breakthroughs involving transformers, but are lacking an in- depth guide to help them adapt these models to their own use cases. The book is not meant to be an introduction to machine learning, and we assume you are comfortable programming in Python and has a basic understanding of deep learning frameworks like PyTorch and TensorFlow. We also assume you have some practical experience with training models on GPUs. Although the book focuses on the PyTorch API of Transformers, Chapter 2 shows you how to translate all the examples to TensorFlow. The following resources provide a good foundation for the topics covered in this book. We assume your technical knowledge is roughly at their level: • Hands-On Machine Learning with Scikit-Learn and TensorFlow, by Aurélien Géron (O’Reilly) • Deep Learning for Coders with fastai and PyTorch, by Jeremy Howard and Sylvain Gugger (O’Reilly) xvi | Preface
📄 Page
19
• Natural Language Processing with PyTorch, by Delip Rao and Brian McMahan (O’Reilly) • The Hugging Face Course, by the open source team at Hugging Face What You Will Learn The goal of this book is to enable you to build your own language applications. To that end, it focuses on practical use cases, and delves into theory only where neces‐ sary. The style of the book is hands-on, and we highly recommend you experiment by running the code examples yourself. The book covers all the major applications of transformers in NLP by having each chapter (with a few exceptions) dedicated to one task, combined with a realistic use case and dataset. Each chapter also introduces some additional concepts. Here’s a high-level overview of the tasks and topics we’ll cover: • Chapter 1, Hello Transformers, introduces transformers and puts them into con‐ text. It also provides an introduction to the Hugging Face ecosystem. • Chapter 2, Text Classification, focuses on the task of sentiment analysis (a com‐ mon text classification problem) and introduces the Trainer API. • Chapter 3, Transformer Anatomy, dives into the Transformer architecture in more depth, to prepare you for the chapters that follow. • Chapter 4, Multilingual Named Entity Recognition, focuses on the task of identify‐ ing entities in texts in multiple languages (a token classification problem). • Chapter 5, Text Generation, explores the ability of transformer models to gener‐ ate text, and introduces decoding strategies and metrics. • Chapter 6, Summarization, digs into the complex sequence-to-sequence task of text summarization and explores the metrics used for this task. • Chapter 7, Question Answering, focuses on building a review-based question answering system and introduces retrieval with Haystack. • Chapter 8, Making Transformers Efficient in Production, focuses on model perfor‐ mance. We’ll look at the task of intent detection (a type of sequence classification problem) and explore techniques such a knowledge distillation, quantization, and pruning. • Chapter 9, Dealing with Few to No Labels, looks at ways to improve model perfor‐ mance in the absence of large amounts of labeled data. We’ll build a GitHub issues tagger and explore techniques such as zero-shot classification and data augmentation. Preface | xvii
📄 Page
20
• Chapter 10, Training Transformers from Scratch, shows you how to build and train a model for autocompleting Python source code from scratch. We’ll look at dataset streaming and large-scale training, and build our own tokenizer. • Chapter 11, Future Directions, explores the challenges transformers face and some of the exciting new directions that research in this area is going into. Transformers offers several layers of abstraction for using and training trans‐ former models. We’ll start with the easy-to-use pipelines that allow us to pass text examples through the models and investigate the predictions in just a few lines of code. Then we’ll move on to tokenizers, model classes, and the Trainer API, which allow us to train models for our own use cases. Later, we’ll show you how to replace the Trainer with the Accelerate library, which gives us full control over the train‐ ing loop and allows us to train large-scale transformers entirely from scratch! Although each chapter is mostly self-contained, the difficulty of the tasks increases in the later chapters. For this reason, we recommend starting with Chapters 1 and 2, before branching off into the topic of most interest. Besides Transformers and Accelerate, we will also make extensive use of Datasets, which seamlessly integrates with other libraries. Datasets offers similar functionality for data processing as Pandas but is designed from the ground up for tackling large datasets and machine learning. With these tools, you have everything you need to tackle almost any NLP challenge! Software and Hardware Requirements Due to the hands-on approach of this book, we highly recommend that you run the code examples while you read each chapter. Since we’re dealing with transformers, you’ll need access to a computer with an NVIDIA GPU to train these models. Fortu‐ nately, there are several free online options that you can use, including: • Google Colaboratory • Kaggle Notebooks • Paperspace Gradient Notebooks To run the examples, you’ll need to follow the installation guide that we provide in the book’s GitHub repository. You can find this guide and the code examples at https://github.com/nlp-with-transformers/notebooks. We developed most of the chapters using NVIDIA Tesla P100 GPUs, which have 16GB of memory. Some of the free platforms provide GPUs with less memory, so you may need to reduce the batch size when training the models. xviii | Preface