📄 Page
1
M A N N I N G François Chollet Matthew Watson THIRD EDITION
📄 Page
2
Answers Rules Data Classical programming Rules Data Answers Machine learning Before training: the model starts with a random initial state. Final state: the model overfits the training data, reaching perfect training loss. Beginning of training: the model gradually moves toward a better fit. Test time: performance of robustly fit model on new data points Further training: a robust fit is achieved, transitively, in the process of morphing the model from its initial state to its final state. Test time: performance of overfit model on new data points Machine learning vs. classical programming (chapter 1) Going from a random model to an overfit model (chapter 5)
📄 Page
3
Praise for the Second Edition Chollet is a master of pedagogy and explains complex concepts with minimal fuss, cutting through the math with practical Python code. He is also an experienced ML researcher, and his insights on various model architectures or training tips are a joy to read. —Martin Görner, Google Immerse yourself in this exciting introduction to the topic with lots of real-world examples. A must-read for every deep learning practitioner. —Sayak Paul, Carted The modern classic just got better. —Edmon Begoli, Oak Ridge National Laboratory Truly the bible of deep learning. —Yiannis Paraskevopoulos, University of West Attica One of the best books on deep learning with Python. —Raushan Jha, Microsoft The book is full of insights, useful both for the novice and the more experienced machine learning professional. —Viton Vitanis, Viseca Payment Services Deep learning well explained, from A to Z. —Todd Cook, Appen This book really implements the democratization of AI: “AI to the people.” —Kjell Jansson, GubboIT
📄 Page
4
(This page has no text content)
📄 Page
5
MANN I NG Shelter ISland François Chollet Matthew Watson Deep Learning with Python, Third Edition
📄 Page
6
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com © 2026 Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid- free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. ∞ Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 ISBN 9781633436589 Printed in the United States of America The author and publisher have made every effort to ensure that the information in this book was correct at press time. The author and publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause, or from any usage of the information herein. Development editor: Ian Hough Review editor: Dunja NikitoviÊ Production editor: Kathy Rossland Copy editor: Alisa Larson Proofreaders: Katie Tennant and Melody Dolab Technical proofreader: Gabriel Rasskin Typesetter: Tamara ŠveliÊ SabljiÊ Cover designer: Marija Tudor
📄 Page
7
v brief contents 1 ■ What is deep learning? 1 2 ■ The mathematical building blocks of neural networks 16 3 ■ Introduction to TensorFlow, PyTorch, JAX, and Keras 60 4 ■ Classification and regression 104 5 ■ Fundamentals of machine learning 136 6 ■ The universal workflow of machine learning 171 7 ■ A deep dive on Keras 190 8 ■ Image classification 231 9 ■ ConvNet architecture patterns 268 10 ■ Interpreting what ConvNets learn 284 11 ■ Image segmentation 308 12 ■ Object detection 329 13 ■ Timeseries forecasting 351 14 ■ Text classification 381 15 ■ Language models and the Transformer 421 16 ■ Text generation 466 17 ■ Image generation 508 18 ■ Best practices for the real world 538 19 ■ The future of AI 564 20 ■ Conclusions 595
📄 Page
8
vi contents preface xv acknowledgments xvii about this book xviii about the authors xxi about the cover illustration xxii 1 What is deep learning? 1 1.1 Artificial intelligence, machine learning, and deep learning 2 1.2 Artificial intelligence 2 1.3 Machine learning 3 1.4 Learning rules and representations from data 4 1.5 The “deep” in “deep learning” 7 1.6 Understanding how deep learning works, in three figures 8 1.7 What makes deep learning different 10 1.8 The age of generative AI 11 1.9 What deep learning has achieved so far 11 1.10 Beware of the short-term hype 12 1.11 Summer can turn to winter 14 1.12 The promise of AI 14
📄 Page
9
viicontents 2 The mathematical building blocks of neural networks 16 2.1 A first look at a neural network 17 2.2 Data representations for neural networks 21 Scalars (rank-0 tensors) 22 ■ Vectors (rank-1 tensors) 22 Matrices (rank-2 tensors) 22 ■ Rank-3 tensors and higher-rank tensors 23 ■ Key attributes 23 ■ Manipulating tensors in NumPy 25 ■ The notion of data batches 25 Real-world examples of data tensors 26 2.3 The gears of neural networks: Tensor operations 28 Element-wise operations 29 ■ Broadcasting 30 Tensor product 32 ■ Tensor reshaping 34 ■ Geometric interpretation of tensor operations 35 ■ A geometric interpretation of deep learning 38 2.4 The engine of neural networks: Gradient-based optimization 39 What’s a derivative? 41 ■ Derivative of a tensor operation: The gradient 42 ■ Stochastic gradient descent 43 ■ Chaining derivatives: The Backpropagation algorithm 46 2.5 Looking back at our first example 51 Reimplementing our first example from scratch 53 Running one training step 55 ■ The full training loop 57 Evaluating the model 58 3 Introduction to TensorFlow, PyTorch, JAX, and Keras 60 3.1 A brief history of deep learning frameworks 61 3.2 How these frameworks relate to each other 63 3.3 Introduction to TensorFlow 63 First steps with TensorFlow 64 ■ An end-to-end example: A linear classifier in pure TensorFlow 69 ■ What makes the TensorFlow approach unique 74 3.4 Introduction to PyTorch 74 First steps with PyTorch 75 ■ An end-to-end example: A linear classifier in pure PyTorch 78 ■ What makes the PyTorch approach unique 81 3.5 Introduction to JAX 82 First steps with JAX 82 ■ Tensors in JAX 83 ■ Random number generation in JAX 83 ■ An end-to-end example: A linear classifier in pure JAX 88 ■ What makes the JAX approach unique 90
📄 Page
10
viii contents 3.6 Introduction to Keras 90 First steps with Keras 91 ■ Layers: The building blocks of deep learning 92 ■ From layers to models 96 ■ The “compile” step: Configuring the learning process 97 ■ Picking a loss function 99 Understanding the fit method 100 ■ Monitoring loss and metrics on validation data 101 ■ Inference: Using a model after training 102 4 Classification and regression 104 4.1 Classifying movie reviews: A binary classification example 106 The IMDb dataset 106 ■ Preparing the data 107 ■ Building your model 108 ■ Validating your approach 111 ■ Using a trained model to generate predictions on new data 115 Further experiments 115 ■ Wrapping up 116 4.2 Classifying newswires: A multiclass classification example 116 The Reuters dataset 116 ■ Preparing the data 118 ■ Building your model 118 ■ Validating your approach 120 ■ Generating predictions on new data 124 ■ A different way to handle the labels and the loss 124 ■ The importance of having sufficiently large intermediate layers 125 ■ Further experiments 125 Wrapping up 126 4.3 Predicting house prices: A regression example 126 The California Housing Price dataset 126 ■ Preparing the data 128 ■ Building your model 128 ■ Validating your approach using K-fold validation 129 ■ Generating predictions on new data 134 ■ Wrapping up 134 5 Fundamentals of machine learning 136 5.1 Generalization: The goal of machine learning 136 Underfitting and overfitting 137 ■ The nature of generalization in deep learning 143 5.2 Evaluating machine-learning models 149 Training, validation, and test sets 149 ■ Beating a common- sense baseline 152 ■ Things to keep in mind about model evaluation 152 5.3 Improving model fit 153 Tuning key gradient descent parameters 153 ■ Using better architecture priors 155 ■ Increasing model capacity 155
📄 Page
11
ixcontents 5.4 Improving generalization 158 Dataset curation 159 ■ Feature engineering 159 ■ Using early stopping 161 ■ Regularizing your model 161 6 The universal workflow of machine learning 171 6.1 Defining the task 172 Framing the problem 172 ■ Collecting a dataset 174 Understanding your data 178 ■ Choosing a measure of success 178 6.2 Developing a model 179 Preparing the data 179 ■ Choosing an evaluation protocol 180 Beating a baseline 181 ■ Scaling up: Developing a model that overfits 182 ■ Regularizing and tuning your model 183 6.3 Deploying your model 183 Explaining your work to stakeholders and setting expectations 184 Shipping an inference model 184 ■ Monitoring your model in the wild 188 ■ Maintaining your model 188 7 A deep dive on Keras 190 7.1 A spectrum of workflows 191 7.2 Different ways to build Keras models 192 The Sequential model 192 ■ The Functional API 195 Subclassing the Model class 202 ■ Mixing and matching different components 204 ■ Remember: Use the right tool for the job 205 7.3 Using built-in training and evaluation loops 206 Writing your own metrics 207 ■ Using callbacks 208 Writing your own callbacks 210 ■ Monitoring and visualization with TensorBoard 212 7.4 Writing your own training and evaluation loops 214 Training vs. inference 215 ■ Writing custom training step functions 216 ■ Low-level usage of metrics 221 ■ Using fit() with a custom training loop 222 ■ Handling metrics in a custom train_step() 226 8 Image classification 231 8.1 Introduction to ConvNets 232 The convolution operation 234 ■ The max-pooling operation 239
📄 Page
12
x contents 8.2 Training a ConvNet from scratch on a small dataset 241 The relevance of deep learning for small-data problems 242 Downloading the data 242 ■ Building your model 245 Data preprocessing 247 ■ Using data augmentation 252 8.3 Using a pretrained model 256 Feature extraction with a pretrained model 256 ■ Fine-tuning a pretrained model 264 9 ConvNet architecture patterns 268 9.1 Modularity, hierarchy, and reuse 269 9.2 Residual connections 272 9.3 Batch normalization 276 9.4 Depthwise separable convolutions 278 9.5 Putting it together: A mini Xception-like model 280 9.6 Beyond convolution: Vision Transformers 282 10 Interpreting what ConvNets learn 284 10.1 Visualizing intermediate activations 285 10.2 Visualizing ConvNet filters 291 Gradient ascent in TensorFlow 294 ■ Gradient ascent in PyTorch 295 ■ Gradient ascent in JAX 295 ■ The filter visualization loop 296 10.3 Visualizing heatmaps of class activation 299 Getting the gradient of the top class: TensorFlow version 302 Getting the gradient of the top class: PyTorch version 302 Getting the gradient of the top class: JAX version 303 Displaying the class activation heatmap 304 10.4 Visualizing the latent space of a ConvNet 306 11 Image segmentation 308 11.1 Computer vision tasks 308 Types of image segmentation 310 11.2 Training a segmentation model from scratch 311 Downloading a segmentation dataset 311 ■ Building and training the segmentation model 314
📄 Page
13
xicontents 11.3 Using a pretrained segmentation model 318 Downloading the Segment Anything Model 319 ■ How Segment Anything works 319 ■ Preparing a test image 321 ■ Prompting the model with a target point 323 ■ Prompting the model with a target box 327 12 Object detection 329 12.1 Single-stage vs. two-stage object detectors 330 Two-stage R-CNN detectors 330 ■ Single-stage detectors 332 12.2 Training a YOLO model from scratch 332 Downloading the COCO dataset 332 ■ Creating a YOLO model 336 ■ Readying the COCO data for the YOLO model 339 Training the YOLO model 342 12.3 Using a pretrained RetinaNet detector 346 13 Timeseries forecasting 351 13.1 Different kinds of timeseries tasks 351 13.2 A temperature forecasting example 352 Preparing the data 356 ■ A commonsense, non-machine-learning baseline 359 ■ Let’s try a basic machine learning model 360 Let’s try a 1D convolutional model 362 13.3 Recurrent neural networks 364 Understanding recurrent neural networks 365 ■ A recurrent layer in Keras 368 ■ Getting the most out of recurrent neural networks 372 ■ Using recurrent dropout to fight overfitting 372 Stacking recurrent layers 375 ■ Using bidirectional RNNs 377 13.4 Going even further 379 14 Text classification 381 14.1 A brief history of natural language processing 381 14.2 Preparing text data 384 Character and word tokenization 387 ■ Subword tokenization 390 14.3 Sets vs. sequences 395 Loading the IMDb classification dataset 396 14.4 Set models 398 Training a bag-of-words model 399 ■ Training a bigram model 403
📄 Page
14
xii contents 14.5 Sequence models 405 Training a recurrent model 406 ■ Understanding word embeddings 409 ■ Using a word embedding 410 ■ Pretraining a word embedding 414 ■ Using the pretrained embedding for classification 418 15 Language models and the Transformer 421 15.1 The language model 421 Training a Shakespeare language model 422 ■ Generating Shakespeare 426 15.2 Sequence-to-sequence learning 428 English-to-Spanish translation 430 ■ Sequence-to-sequence learning with RNNs 432 15.3 The Transformer architecture 437 Dot-product attention 439 ■ Transformer encoder block 444 Transformer decoder block 446 ■ Sequence-to-sequence learning with a Transformer 448 ■ Embedding positional information 451 15.4 Classification with a pretrained Transformer 454 Pretraining a Transformer encoder 454 ■ Loading a pretrained Transformer 455 ■ Preprocessing IMDb movie reviews 458 Fine-tuning a pretrained Transformer 460 15.5 What makes the Transformer effective? 461 16 Text generation 466 16.1 A brief history of sequence generation 468 16.2 Training a mini-GPT 470 Building the model 473 ■ Pretraining the model 476 Generative decoding 478 ■ Sampling strategies 480 16.3 Using a pretrained LLM 484 Text generation with the Gemma model 485 ■ Instruction fine- tuning 488 ■ Low-Rank Adaptation (LoRA) 490 16.4 Going further with LLMs 495 Reinforcement Learning with Human Feedback (RLHF) 495 Multimodal LLMs 498 ■ Retrieval Augmented Generation (RAG) 501 ■ “Reasoning” models 502 16.5 Where are LLMs heading next? 504
📄 Page
15
xiiicontents 17 Image generation 508 17.1 Deep learning for image generation 508 Sampling from latent spaces of images 509 ■ Variational autoencoders 510 ■ Implementing a VAE with Keras 513 17.2 Diffusion models 518 The Oxford Flowers dataset 520 ■ A U-Net denoising autoencoder 521 ■ The concepts of diffusion time and diffusion schedule 523 ■ The training process 525 ■ The generation process 527 ■ Visualizing results with a custom callback 528 It’s go time! 529 17.3 Text-to-image models 531 Exploring the latent space of a text-to-image model 533 18 Best practices for the real world 538 18.1 Getting the most out of your models 539 Hyperparameter optimization 539 ■ Model ensembling 546 18.2 Scaling up model training with multiple devices 548 Multi-GPU training 548 ■ Distributed training in practice 550 TPU training 555 18.3 Speeding up training and inference with lower-precision computation 556 Understanding floating-point precision 556 ■ Float16 inference 558 ■ Mixed-precision training 559 ■ Using loss scaling with mixed precision 559 ■ Beyond mixed precision: float8 training 560 ■ Faster inference with quantization 561 19 The future of AI 564 19.1 The limitations of deep learning 564 Deep learning models struggle to adapt to novelty 565 Deep learning models are highly sensitive to phrasing and other distractors 567 ■ Deep learning models struggle to learn generalizable programs 569 ■ The risk of anthropomorphizing machine-learning models 569 19.2 Scale isn’t all you need 570 Automatons vs. intelligent agents 571 ■ Local generalization vs. extreme generalization 573 ■ The purpose of intelligence 575 Climbing the spectrum of generalization 575
📄 Page
16
xiv contents 19.3 How to build intelligence 576 The kaleidoscope hypothesis 577 ■ The essence of intelligence: Abstraction acquisition and recombination 578 The importance of setting the right target 578 ■ A new target: On-the-fly adaptation 580 ■ ARC Prize 581 ■ The test-time adaptation era 582 ■ ARC-AGI 2 583 19.4 The missing ingredients: Search and symbols 584 The two poles of abstraction 585 ■ Cognition as a combination of both kinds of abstraction 587 ■ Why deep learning isn’t a complete answer to abstraction generation 588 ■ An alternative approach to AI: Program synthesis 589 ■ Blending deep learning and program synthesis 590 ■ Modular component recombination and lifelong learning 592 ■ The long-term vision 593 20 Conclusions 595 20.1 Key concepts in review 595 Various approaches to artificial intelligence 596 ■ What makes deep learning special within the field of machine learning 596 How to think about deep learning 597 ■ Key enabling technologies 598 ■ The universal machine learning workflow 599 ■ Key network architectures 600 20.2 Limitations of deep learning 605 20.3 What might lie ahead 606 20.4 Staying up to date in a fast-moving field 607 Practice on real-world problems using Kaggle 607 ■ Read about the latest developments on arXiv 607 ■ Explore the Keras ecosystem 608 20.5 Final words 608 index 609
📄 Page
17
xv preface If you’ve picked up this book, you’re probably aware of the extraordinary progress that deep learning has brought to the field of artificial intelligence. In just a handful of years, we went from near-unusable computer vision and natural language process- ing to highly performant systems deployed at scale in products you use every day. The consequences of this sudden progress extend to almost every industry. Deep learning is applied to a diverse range of important problems across domains as different as med- ical imaging, agriculture, autonomous driving, education, disaster prevention, and manufacturing. Digital assistants are becoming pervasive on nearly every consumer computing device. Yet, we believe deep learning is still in its early days. AI is going to represent a much greater wave of disruption than anything that came before it. The technology will con- tinue to make its way to every problem where it can provide some utility—a transforma- tion that will take decades to play out. This is a transformation with profound societal implications, on a global scale, spanning industries and cultures. We strongly believe the only way to ensure such a transformation is beneficial for the people it will affect— all of us—is to radically democratize access to the underlying technology. We need to put understanding of deep learning—how it works, where it fails, and how to apply it—in the hands of as many people as possible, including people who aren’t researchers in the field. When I wrote the Keras deep learning framework in March 2015, the democratiza- tion of AI wasn’t what I had in mind. I had been doing research in machine learning for several years and had built Keras to help with my own experiments. But since then, as newcomers have entered the field of deep learning, many have picked up Keras as their tool of choice. Accessibility quickly became an explicit goal in the development
📄 Page
18
xvi preface of Keras, and over the last decade, the Keras developer community has made remark- able progress in this area. We’ve put deep learning into the hands of millions of peo- ple, who, in turn, are using it to solve problems that were, until recently, thought to be unsolvable. The book you’re holding is another step on the way toward broadening access to the field. We aim to make the concepts behind deep learning and their implementation as approachable as possible. Doing so doesn’t require watering anything down; we believe that there are no difficult ideas in deep learning. This book will start with the very basics of the field and, layer by layer, build up to the cutting-edge generative AI models being deployed today. This is the third edition of Deep Learning with Python. In an effort to make this content as broadly available as possible, we are making this edition available for free, online at https://deeplearningwithpython.io/. We hope you will consider purchasing a physi- cal copy to support the project and read what we consider to be the best presentation of the content. This third edition amounts to a complete rewrite of the entire book, with a broader introduction to today’s popular deep learning frameworks and greatly expanded content on large generative AI models. We hope this book proves valuable and helps you solve the problems that matter to you.
📄 Page
19
xvii acknowledgments First, we’d like to thank the Keras community for making this book possible. Over the past decade, Keras has grown to have thousands of open source contributors and more than 2 million users. Their contributions and feedback have turned Keras into what it is today. On a personal note, François would like to thank his wife for her endless support during the development of Keras and the writing of this book. Matthew would like to thank his partner Kate, his family, and all the friends who have supported him along the way. We thank the people at Manning who made this book possible: publisher Marjan Bace and everyone on the editorial and production teams, including Michael Stephens, Aleksandar Dragosavljević, and many others who worked behind the scenes. Many thanks go to the peer reviewers: Aakash Nain, Abheesht Sharma, Abhishek Shivanna, Aritra Roy Gosthipaty, Avinash Tiwari, Brandon Friar, Christopher Kardell, Srivathsan Srinivasagopalan, Edmon Begoli, Guillaume Alleon, Ian Stirk, Jacqueline Nolis, Kishore Reddy, Levi McClenny, Margaret Maynard-Reid, Nilson Chapagain, Prashanth Josyula, Preetish Kakkar, Sai Srinivas Somarouthu, Samuel Marks, Srivathsan Srinivasagopalan, Thiago Britto Borges, Todd Cook, and Varun Chawla—and all the other people who sent us feedback on the draft of the book. On the coding side, special thanks go to Tomasz Kalinowski, who contributed to code examples in this book; Ian Hough, who served as the book’s development editor; and Gabriel Rasskin, who served as the book’s technical proofreader.
📄 Page
20
xviii about this book This book was written for anyone who wishes to explore deep learning from scratch or broaden their understanding of deep learning. Whether you’re a practicing machine learning engineer, a software developer, or a college student, you’ll find value in these pages. You’ll explore deep learning in an approachable way—starting simply and working up to state-of-the-art techniques. We hope you’ll find that this book strikes a balance between intuition, theory, and hands-on practice. It avoids mathematical notation, pre- ferring instead to explain the core ideas of deep learning via functioning code paired with explanations of the underlying principles. You’ll train machine learning models from scratch in a number of different problem domains and learn practical recommen- dations for writing deep learning programs and deploying them in the real world. After reading this book, you’ll have a solid understanding of what deep learning is, when it’s applicable, and what its limitations are. You’ll be familiar with the standard workflow for approaching and solving machine learning problems, and you’ll know how to address commonly encountered issues. Who should read this book This book is written for people with some Python programming experience who want to get started with machine learning and deep learning. But this book can also be valu- able to many different types of readers: ¡ If you’re a data scientist familiar with machine learning, this book will provide you with a solid, practical introduction to deep learning, the fastest-growing and most significant subfield of machine learning.