Deep Learning for Vision Systems (Mohamed Elgendy) (Z-Library)

Author: Mohamed Elgendy

科学

Computer vision is central to many leading-edge innovations, including self-driving cars, drones, augmented reality, facial recognition, and much, much more. Amazing new computer vision applications are developed every day, thanks to rapid advances in AI and deep learning (DL). Deep Learning for Vision Systems teaches you the concepts and tools for building intelligent, scalable computer vision systems that can identify and react to objects in images, videos, and real life. With author Mohamed Elgendy's expert instruction and illustration of real-world projects, you’ll finally grok state-of-the-art deep learning techniques, so you can build, contribute to, and lead in the exciting realm of computer vision! About the technology How much has computer vision advanced? One ride in a Tesla is the only answer you’ll need. Deep learning techniques have led to exciting breakthroughs in facial recognition, interactive simulations, and medical imaging, but nothing beats seeing a car respond to real-world stimuli while speeding down the highway. About the book How does the computer learn to understand what it sees? Deep Learning for Vision Systems answers that by applying deep learning to computer vision. Using only high school algebra, this book illuminates the concepts behind visual intuition. You'll understand how to use deep learning architectures to build vision system applications for image generation and facial recognition.

📄 File Format: PDF

💾 File Size: 16.8 MB

Views

Downloads

0.00

Total Donations

📖 Read Online ⬇️ Download

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

📄 Page 1

M A N N I N G Mohamed Elgendy

📄 Page 2

Deep Learning for Vision Systems MOHAMED ELGENDY M A N N I N G SHELTER ISLAND

📄 Page 3

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2020 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Development editor: Jenny Stout Technical development editor: Alain Couniot Manning Publications Co. Review editor: Ivan Martinović 20 Baldwin Road Production editor: Lori Weidert PO Box 761 Copy editor: Tiffany Taylor Shelter Island, NY 11964 Proofreader: Keri Hales Technical proofreader: Al Krinker Typesetter: Dennis Dalinnik Cover designer: Marija Tudor ISBN: 9781617296192 Printed in the United States of America

📄 Page 4

To my mom, Huda, who taught me perseverance and kindness To my dad, Ali, who taught me patience and purpose To my loving and supportive wife, Amanda, who always inspires me to keep climbing To my two-year-old daughter, Emily, who teaches me every day that AI still has a long way to go to catch up with even the tiniest humans

📄 Page 5

(This page has no text content)

📄 Page 6

contents preface xiii acknowledgments xv about this book xvi about the author xix about the cover illustration xx PART 1 DEEP LEARNING FOUNDATION .............................1 1 Welcome to computer vision 3 1.1 Computer vision 4 What is visual perception? 5 ■ Vision systems 5 Sensing devices 7 ■ Interpreting devices 8 1.2 Applications of computer vision 10 Image classification 10 ■ Object detection and localization 12 Generating art (style transfer) 12 ■ Creating images 13 Face recognition 15 ■ Image recommendation system 15 1.3 Computer vision pipeline: The big picture 17 1.4 Image input 19 Image as functions 19 ■ How computers see images 21 Color images 21v

📄 Page 7

CONTENTSvi1.5 Image preprocessing 23 Converting color images to grayscale to reduce computation complexity 23 1.6 Feature extraction 27 What is a feature in computer vision? 27 ■ What makes a good (useful) feature? 28 ■ Extracting features (handcrafted vs. automatic extracting) 31 1.7 Classifier learning algorithm 33 2 Deep learning and neural networks 36 2.1 Understanding perceptrons 37 What is a perceptron? 38 ■ How does the perceptron learn? 43 Is one neuron enough to solve complex problems? 43 2.2 Multilayer perceptrons 45 Multilayer perceptron architecture 46 ■ What are hidden layers? 47 ■ How many layers, and how many nodes in each layer? 47 ■ Some takeaways from this section 50 2.3 Activation functions 51 Linear transfer function 53 ■ Heaviside step function (binary classifier) 54 ■ Sigmoid/logistic function 55 ■ Softmax function 57 ■ Hyperbolic tangent function (tanh) 58 Rectified linear unit 58 ■ Leaky ReLU 59 2.4 The feedforward process 62 Feedforward calculations 64 ■ Feature learning 65 2.5 Error functions 68 What is the error function? 69 ■ Why do we need an error function? 69 ■ Error is always positive 69 ■ Mean square error 70 ■ Cross-entropy 71 ■ A final note on errors and weights 72 2.6 Optimization algorithms 74 What is optimization? 74 ■ Batch gradient descent 77 Stochastic gradient descent 83 ■ Mini-batch gradient descent 84 Gradient descent takeaways 85 2.7 Backpropagation 86 What is backpropagation? 87 ■ Backpropagation takeaways 90

📄 Page 8

CONTENTS vii3 Convolutional neural networks 92 3.1 Image classification using MLP 93 Input layer 94 ■ Hidden layers 96 ■ Output layer 96 Putting it all together 97 ■ Drawbacks of MLPs for processing images 99 3.2 CNN architecture 102 The big picture 102 ■ A closer look at feature extraction 104 A closer look at classification 105 3.3 Basic components of a CNN 106 Convolutional layers 107 ■ Pooling layers or subsampling 114 Fully connected layers 119 3.4 Image classification using CNNs 121 Building the model architecture 121 ■ Number of parameters (weights) 123 3.5 Adding dropout layers to avoid overfitting 124 What is overfitting? 125 ■ What is a dropout layer? 125 Why do we need dropout layers? 126 ■ Where does the dropout layer go in the CNN architecture? 127 3.6 Convolution over color images (3D images) 128 How do we perform a convolution on a color image? 129 What happens to the computational complexity? 130 3.7 Project: Image classification for color images 133 4 Structuring DL projects and hyperparameter tuning 145 4.1 Defining performance metrics 146 Is accuracy the best metric for evaluating a model? 147 Confusion matrix 147 ■ Precision and recall 148 F-score 149 4.2 Designing a baseline model 149 4.3 Getting your data ready for training 151 Splitting your data for train/validation/test 151 Data preprocessing 153 4.4 Evaluating the model and interpreting its performance 156 Diagnosing overfitting and underfitting 156 ■ Plotting the learning curves 158 ■

📄 Page 9

CONTENTSviii4.5 Improving the network and tuning hyperparameters 162 Collecting more data vs. tuning hyperparameters 162 Parameters vs. hyperparameters 163 ■ Neural network hyperparameters 163 ■ Network architecture 164 4.6 Learning and optimization 166 Learning rate and decay schedule 166 ■ A systematic approach to find the optimal learning rate 169 ■ Learning rate decay and adaptive learning 170 ■ Mini-batch size 171 4.7 Optimization algorithms 174 Gradient descent with momentum 174 ■ Adam 175 Number of epochs and early stopping criteria 175 ■ Early stopping 177 4.8 Regularization techniques to avoid overfitting 177 L2 regularization 177 ■ Dropout layers 179 Data augmentation 180 4.9 Batch normalization 181 The covariate shift problem 181 ■ Covariate shift in neural networks 182 ■ How does batch normalization work? 183 Batch normalization implementation in Keras 184 ■ Batch normalization recap 185 4.10 Project: Achieve high accuracy on image classification 185 PART 2 IMAGE CLASSIFICATION AND DETECTION...........193 5 Advanced CNN architectures 195 5.1 CNN design patterns 197 5.2 LeNet-5 199 LeNet architecture 199 ■ LeNet-5 implementation in Keras 200 Setting up the learning hyperparameters 202 ■ LeNet performance on the MNIST dataset 203 5.3 AlexNet 203 AlexNet architecture 205 ■ Novel features of AlexNet 205 AlexNet implementation in Keras 207 ■ Setting up the learning hyperparameters 210 ■ AlexNet performance 211 5.4 VGGNet 212 Novel features of VGGNet 212 ■ VGGNet configurations 213 Learning hyperparameters 216 ■

📄 Page 10

CONTENTS ix5.5 Inception and GoogLeNet 217 Novel features of Inception 217 ■ Inception module: Naive version 218 ■ Inception module with dimensionality reduction 220 ■ Inception architecture 223 ■ GoogLeNet in Keras 225 ■ Learning hyperparameters 229 ■ Inception performance on the CIFAR dataset 229 5.6 ResNet 230 Novel features of ResNet 230 ■ Residual blocks 233 ■ ResNet implementation in Keras 235 ■ Learning hyperparameters 238 ResNet performance on the CIFAR dataset 238 6 Transfer learning 240 6.1 What problems does transfer learning solve? 241 6.2 What is transfer learning? 243 6.3 How transfer learning works 250 How do neural networks learn features? 252 ■ Transferability of features extracted at later layers 254 6.4 Transfer learning approaches 254 Using a pretrained network as a classifier 254 ■ Using a pretrained network as a feature extractor 256 ■ Fine-tuning 258 6.5 Choosing the appropriate level of transfer learning 260 Scenario 1: Target dataset is small and similar to the source dataset 260 ■ Scenario 2: Target dataset is large and similar to the source dataset 261 ■ Scenario 3: Target dataset is small and different from the source dataset 261 ■ Scenario 4: Target dataset is large and different from the source dataset 261 ■ Recap of the transfer learning scenarios 262 6.6 Open source datasets 262 MNIST 263 ■ Fashion-MNIST 264 ■ CIFAR 264 ImageNet 265 ■ MS COCO 266 ■ Google Open Images 267 Kaggle 267 6.7 Project 1: A pretrained network as a feature extractor 268 6.8 Project 2: Fine-tuning 274 7 Object detection with R-CNN, SSD, and YOLO 283 7.1 General object detection framework 285 Region proposals 286 ■ Network predictions 287 Non-maximum suppression (NMS) 288 ■ Object-detector evaluation metrics 289

📄 Page 11

CONTENTSx7.2 Region-based convolutional neural networks (R-CNNs) 292 R-CNN 293 ■ Fast R-CNN 297 ■ Faster R-CNN 300 Recap of the R-CNN family 308 7.3 Single-shot detector (SSD) 310 High-level SSD architecture 311 ■ Base network 313 Multi-scale feature layers 315 ■ Non-maximum suppression 319 7.4 You only look once (YOLO) 320 How YOLOv3 works 321 ■ YOLOv3 architecture 324 7.5 Project: Train an SSD network in a self-driving car application 326 Step 1: Build the model 328 ■ Step 2: Model configuration 329 Step 3: Create the model 330 ■ Step 4: Load the data 331 Step 5: Train the model 333 ■ Step 6: Visualize the loss 334 Step 7: Make predictions 335 PART 3 GENERATIVE MODELS AND VISUAL EMBEDDINGS... 339 8 Generative adversarial networks (GANs) 341 8.1 GAN architecture 343 Deep convolutional GANs (DCGANs) 345 ■ The discriminator model 345 ■ The generator model 348 ■ Training the GAN 351 ■ GAN minimax function 354 8.2 Evaluating GAN models 357 Inception score 358 ■ Fréchet inception distance (FID) 358 Which evaluation scheme to use 358 8.3 Popular GAN applications 359 Text-to-photo synthesis 359 ■ Image-to-image translation (Pix2Pix GAN) 360 ■ Image super-resolution GAN (SRGAN) 361 Ready to get your hands dirty? 362 8.4 Project: Building your own GAN 362 9 DeepDream and neural style transfer 374 9.1 How convolutional neural networks see the world 375 Revisiting how neural networks work 376 ■ Visualizing CNN features 377 ■

📄 Page 12

CONTENTS xi9.2 DeepDream 384 How the DeepDream algorithm works 385 ■ DeepDream implementation in Keras 387 9.3 Neural style transfer 392 Content loss 393 ■ Style loss 396 ■ Total variance loss 397 Network training 397 10 Visual embeddings 400 10.1 Applications of visual embeddings 402 Face recognition 402 ■ Image recommendation systems 403 Object re-identification 405 10.2 Learning embedding 406 10.3 Loss functions 407 Problem setup and formalization 408 ■ Cross-entropy loss 409 Contrastive loss 410 ■ Triplet loss 411 ■ Naive implementation and runtime analysis of losses 412 10.4 Mining informative data 414 Dataloader 414 ■ Informative data mining: Finding useful triplets 416 ■ Batch all (BA) 419 ■ Batch hard (BH) 419 Batch weighted (BW) 421 ■ Batch sample (BS) 421 10.5 Project: Train an embedding network 423 Fashion: Get me items similar to this 424 ■ Vehicle re-identification 424 ■ Implementation 426 ■ Testing a trained model 427 10.6 Pushing the boundaries of current accuracy 431 appendix A Getting set up 437 index 445

📄 Page 13

(This page has no text content)

📄 Page 14

preface Two years ago, I decided to write a book to teach deep learning for computer vision from an intuitive perspective. My goal was to develop a comprehensive resource that takes learners from knowing only the basics of machine learning to building advanced deep learning algorithms that they can apply to solve complex computer vision problems. The problem : In short, as of this moment, there are no books out there that teach deep learning for computer vision the way I wanted to learn about it. As a beginner machine learning engineer, I wanted to read one book that would take me from point A to point Z. I planned to specialize in building modern computer vision applications, and I wished that I had a single resource that would teach me everything I needed to do two things: 1) use neural networks to build an end-to-end computer vision applica- tion, and 2) be comfortable reading and implementing research papers to stay up-to- date with the latest industry advancements. I found myself jumping between online courses, blogs, papers, and YouTube videos to create a comprehensive curriculum for myself. It’s challenging to try to comprehend what is happening under the hood on a deeper level: not just a basic understanding, but how the concepts and theories make sense mathematically. It was impossible to find one comprehensive resource that (horizontally) covered the most important topics that I needed to learn to work on complex computer vision applica- tions while also diving deep enough (vertically) to help me understand the math that makes the magic work. xiii

📄 Page 15

PREFACExiv As a beginner, I searched but couldn’t find anything to meet these needs. So now I’ve written it. My goal has been to write a book that not only teaches the content I wanted when I was starting out, but also levels up your ability to learn on your own. My solution is a comprehensive book that dives deep both horizontally and vertically: ■ Horizontally—This book explains most topics that an engineer needs to learn to build production-ready computer vision applications, from neural networks and how they work to the different types of neural network architectures and how to train, evaluate, and tune the network. ■ Vertically—The book dives a level or two deeper than the code and explains intuitively (and gently) how the math works under the hood, to empower you to be comfortable reading and implementing research papers or even invent- ing your own techniques. At the time of writing, I believe this is the only deep learning for vision systems resource that is taught this way. Whether you are looking for a job as a computer vision engineer, want to gain a deeper understanding of advanced neural networks algorithms in computer vision, or want to build your product or startup, I wrote this book with you in mind. I hope you enjoy it.

📄 Page 16

acknowledgments This book was a lot of work. No, make that really a lot of work! But I hope you will find it valuable. There are quite a few people I’d like to thank for helping me along the way. I would like to thank the people at Manning who made this book possible: pub- lisher Marjan Bace and everyone on the editorial and production teams, including Jennifer Stout, Tiffany Taylor, Lori Weidert, Katie Tennant, and many others who worked behind the scenes. Many thanks go to the technical peer reviewers led by Alain Couniot—Al Krinker, Albert Choy, Alessandro Campeis, Bojan Djurkovic, Burhan ul haq, David Fombella Pombal, Ishan Khurana, Ita Cirovic Donev, Jason Coleman, Juan Gabriel Bono, Juan José Durillo Barrionuevo, Michele Adduci, Millad Dagdoni, Peter Hraber, Richard Vaughan, Rohit Agarwal, Tony Holdroyd, Tymoteusz Wolodzko, and Will Fuger—and the active readers who contributed their feedback in the book forums. Their contribu- tions included catching typos, code errors and technical mistakes, as well as making valuable topic suggestions. Each pass through the review process and each piece of feedback implemented through the forum topics shaped and molded the final ver- sion of this book. Finally, thank you to the entire Synapse Technology team. You’ve created some- thing that’s incredibly cool. Thank you to Simanta Guatam, Aleksandr Patsekin, Jay Patel, and others for answering my questions and brainstorming ideas for the book.xv

📄 Page 17

about this book Who should read this book If you know the basic machine learning framework, can hack around in Python, and want to learn how to build and train advanced, production-ready neural networks to solve complex computer vision problems, I wrote this book for you. The book was written for anyone with intermediate Python experience and basic machine learning understanding who wishes to explore training deep neural networks and learn to apply deep learning to solve computer vision problems. When I started writing the book, my primary goal was as follows: “I want to write a book to grow readers’ skills, not teach them content.” To achieve this goal, I had to keep an eye on two main tenets: 1 Teach you how to learn. I don’t want to read a book that just goes through a set of scientific facts. I can get that on the internet for free. If I read a book, I want to finish it having grown my skillset so I can study the topic further. I want to learn how to think about the presented solutions and come up with my own. 2 Go very deep. If I’m successful in satisfying the first tenet, that makes this one easy. If you learn how to learn new concepts, that allows me to dive deep with- out worrying that you might fall behind. This book doesn’t avoid the math part of the learning, because understanding the mathematical equations will empower you with the best skill in the AI world: the ability to read research papers, compare innovations, and make the right decisions about implement- ing new concepts in your own problems. But I promise to introduce only the mathematical concepts you need, and I promise to present them in a way thatxvi

📄 Page 18

ABOUT THIS BOOK xviidoesn’t interrupt your flow of understanding the concepts without the math part if you prefer. How this book is organized: A roadmap This book is structured into three parts. The first part explains deep leaning in detail as a foundation for the remaining topics. I strongly recommend that you not skip this section, because it dives deep into neural network components and definitions and explains all the notions required to be able to understand how neural networks work under the hood. After reading part 1, you can jump directly to topics of interest in the remaining chapters. Part 2 explains deep learning techniques to solve object classifica- tion and detection problems, and part 3 explains deep learning techniques to gener- ate images and visual embeddings. In several chapters, practical projects implement the topics discussed. About the code All of this book’s code examples use open source frameworks that are free to down- load. We will be using Python, Tensorflow, Keras, and OpenCV. Appendix A walks you through the complete setup. I also recommend that you have access to a GPU if you want to run the book projects on your machine, because chapters 6–10 contain more complex projects to train deep networks that will take a long time on a regular CPU. Another option is to use a cloud environment like Google Colab for free or other paid options. Examples of source code occur both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to sepa- rate it from ordinary text. Sometimes code is also in bold to highlight code that has changed from previous steps in the chapter, such as when a new feature adds to an existing line of code. In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In rare cases, even this was not enough, and listings include line-continuation markers (➥). Additionally, comments in the source code have often been removed from the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts. The code for the examples in this book is available for download from the Man- ning website at www.manning.com/books/deep-learning-for-vision-systems and from GitHub at https://github.com/moelgendy/deep_learning_for_vision_systems. liveBook discussion forum Purchase of Deep Learning for Vision Systems includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the author and from other users. To

📄 Page 19

ABOUT THIS BOOKxviiiaccess the forum, go to https://livebook.manning.com/#!/book/deep-learning-for- vision-systems/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/#!/discussion. Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We sug- gest you try asking the author some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.

📄 Page 20

about the author Mohamed Elgendy is the vice president of engineering at Rakuten, where he is lead- ing the development of its AI platform and products. Previously, he served as head of engineering at Synapse Technology, building proprietary computer vision applica- tions to detect threats at security checkpoints worldwide. At Amazon, Mohamed built and managed the central AI team that serves as a deep learning think tank for Ama- zon engineering teams like AWS and Amazon Go. He also developed the deep learn- ing for computer vision curriculum at Amazon’s Machine University. Mohamed regularly speaks at AI conferences like Amazon’s DevCon, O’Reilly’s AI conference, and Google’s I/O.xix

The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00

Total Amount (¥)

Donation Count

← Back to List