Output of plot_model showing the layers in the model batch_normalization_6: BatchNormalization batch_normalization_4: BatchNormalization batch_normalization_3: BatchNormalization embedding_3: Embedding embedding_6: Embedding embedding_4: Embedding flatten_3: Flatten flatten_4: Flatten flatten_6: Flatten dropout_3: Dropout dropout_6: Dropout dropout_4: Dropoutconcatenate_2: Concatenate concatenate_5: Concatenate dense: Dense concatenate_4: Concatenate concatenate_3: Concatenate day: InputLayer daym: InputLayer month: InputLayer B A
Deep Learning with Structured Data
(This page has no text content)
Deep Learning with Structured Data MARK RYAN M A N N I N G SHELTER ISLAND
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2020 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. Development editor: Christina Taylor 20 Baldwin Road Technical development editor: Al Krinker PO Box 761 Review editor: Ivan Martinović Shelter Island, NY 11964 Production editor: Lori Weidert Copy editor: Keir Simpson Proofreader: Melody Dolab Technical proofreader: Karsten Strobek Typesetter: Gordan Salinovic Cover designer: Marija Tudor ISBN 9781617296727 Printed in the United States of America
To my daughter, Josephine, who always reminds me that God is the Author.
(This page has no text content)
vii brief contents 1 ■ Why deep learning with structured data? 1 2 ■ Introduction to the example problem and Pandas dataframes 18 3 ■ Preparing the data, part 1: Exploring and cleansing the data 45 4 ■ Preparing the data, part 2: Transforming the data 67 5 ■ Preparing and building the model 87 6 ■ Training the model and running experiments 120 7 ■ More experiments with the trained model 150 8 ■ Deploying the model 161 9 ■ Recommended next steps 192
(This page has no text content)
ix contents preface xv acknowledgments xvii about this book xviii about the author xxii about the cover illustration xxiii 1 Why deep learning with structured data? 1 1.1 Overview of deep learning 2 1.2 Benefits and drawbacks of deep learning 6 1.3 Overview of the deep learning stack 9 1.4 Structured vs. unstructured data 10 1.5 Objections to deep learning with structured data 12 1.6 Why investigate deep learning with a structured data problem? 14 1.7 An overview of the code accompanying this book 14 1.8 What you need to know 15 1.9 Summary 16
CONTENTSx 2 Introduction to the example problem and Pandas dataframes 18 2.1 Development environment options for deep learning 19 2.2 Code for exploring Pandas 21 2.3 Pandas dataframes in Python 22 2.4 Ingesting CSV files into Pandas dataframes 24 2.5 Using Pandas to do what you would do with SQL 25 2.6 The major example: Predicting streetcar delays 28 2.7 Why is a real-world dataset critical for learning about deep learning? 30 2.8 Format and scope of the input dataset 31 2.9 The destination: An end-to-end solution 33 2.10 More details on the code that makes up the solutions 35 2.11 Development environments: Vanilla vs. deep-learning- enabled 37 2.12 A deeper look at the objections to deep learning 38 2.13 How deep learning has become more accessible 41 2.14 A first taste of training a deep learning model 42 2.15 Summary 44 3 Preparing the data, part 1: Exploring and cleansing the data 45 3.1 Code for exploring and cleansing the data 46 3.2 Using config files with Python 46 3.3 Ingesting XLS files into a Pandas dataframe 48 3.4 Using pickle to save your Pandas dataframe from one session to another 52 3.5 Exploring the data 54 3.6 Categorizing data into continuous, categorical, and text categories 58 3.7 Cleaning up problems in the dataset: missing data, errors, and guesses 60 3.8 Finding out how much data deep learning needs 65 3.9 Summary 66
CONTENTS xi 4 Preparing the data, part 2: Transforming the data 67 4.1 Code for preparing and transforming the data 68 4.2 Dealing with incorrect values: Routes 68 4.3 Why only one substitute for all bad values? 70 4.4 Dealing with incorrect values: Vehicles 71 4.5 Dealing with inconsistent values: Location 72 4.6 Going the distance: Locations 74 4.7 Fixing type mismatches 77 4.8 Dealing with rows that still contain bad data 78 4.9 Creating derived columns 79 4.10 Preparing non-numeric data to train a deep learning model 80 4.11 Overview of the end-to-end solution 83 4.12 Summary 86 5 Preparing and building the model 87 5.1 Data leakage and features that are fair game for training the model 88 5.2 Domain expertise and minimal scoring tests to prevent data leakage 89 5.3 Preventing data leakage in the streetcar delay prediction problem 90 5.4 Code for exploring Keras and building the model 92 5.5 Deriving the dataframe to use to train the model 93 5.6 Transforming the dataframe into the format expected by the Keras model 97 5.7 A brief history of Keras and TensorFlow 98 5.8 Migrating from TensorFlow 1.x to TensorFlow 2 99 5.9 TensorFlow vs. PyTorch 100 5.10 The structure of a deep learning model in Keras 100 5.11 How the data structure defines the Keras model 104 5.12 The power of embeddings 107
CONTENTSxii 5.13 Code to build a Keras model automatically based on the data structure 109 5.14 Exploring your model 111 5.15 Model parameters 117 5.16 Summary 119 6 Training the model and running experiments 120 6.1 Code for training the deep learning model 121 6.2 Reviewing the process of training a deep learning model 121 6.3 Reviewing the overall goal of the streetcar delay prediction model 124 6.4 Selecting the train, validation, and test datasets 127 6.5 Initial training run 127 6.6 Measuring the performance of your model 130 6.7 Keras callbacks: Getting the best out of your training runs 133 6.8 Getting identical results from multiple training runs 140 6.9 Shortcuts to scoring 141 6.10 Explicitly saving trained models 143 6.11 Running a series of training experiments 143 6.12 Summary 148 7 More experiments with the trained model 150 7.1 Code for more experiments with the model 151 7.2 Validating whether removing bad values improves the model 151 7.3 Validating whether embeddings for columns improve the performance of the model 152 7.4 Comparing the deep learning model with XGBoost 153 7.5 Possible next steps for improving the deep learning model 159 7.6 Summary 160 8 Deploying the model 161 8.1 Overview of model deployment 162 8.2 If deployment is so important, why is it so hard? 163
CONTENTS xiii 8.3 Review of one-off scoring 164 8.4 The user experience with web deployment 165 8.5 Steps to deploy your model with web deployment 165 8.6 Behind the scenes with web deployment 169 8.7 The user experience with Facebook Messenger deployment 172 8.8 Behind the scenes with Facebook Messenger deployment 174 8.9 More background on Rasa 175 8.10 Steps to deploy your model in Facebook Messenger with Rasa 177 8.11 Introduction to pipelines 180 8.12 Defining pipelines in the model training phase 183 8.13 Applying pipelines in the scoring phase 186 8.14 Maintaining a model after deployment 188 8.15 Summary 190 9 Recommended next steps 192 9.1 Reviewing what we have covered so far 193 9.2 What we could do next with the streetcar delay prediction project 194 9.3 Adding location details to the streetcar delay prediction project 194 9.4 Training our deep learning model with weather data 198 9.5 Adding season or time of day to the streetcar delay prediction project 203 9.6 Imputation: An alternative to removing records with bad values 204 9.7 Making the web deployment of the streetcar delay prediction model generally available 204 9.8 Adapting the streetcar delay prediction model to a new dataset 206 9.9 Preparing the dataset and training the model 209 9.10 Deploying the model with web deployment 211 9.11 Deploying the model with Facebook Messenger 212
CONTENTSxiv 9.12 Adapting the approach in this book to a different dataset 215 9.13 Resources for additional learning 219 9.14 Summary 220 appendix Using Google Colaboratory 223 index 233
xv preface I believe that when people look back in 50 years and assess the first two decades of the century, deep learning will be at the top of the list of technical innovations. The theo- retical foundations of deep learning were established in the 1950s, but it wasn’t until 2012 that the potential of deep learning became evident to nonspecialists. Now, almost a decade later, deep learning pervades our lives, from smart speakers that are able to seamlessly convert our speech into text to systems that can beat any human in an ever-expanding range of games. This book examines an overlooked corner of the deep learning world: applying deep learning to structured, tabular data (that is, data organized in rows and columns). If the conventional wisdom is to avoid using deep learning with structured data, and the marquee applications of deep learning (such as image recognition) deal with nonstructured data, why should you read a book about deep learning with structured data? First, as I argue in chapters 1 and 2, some of the objections to using deep learn- ing to solve structured data problems (such as deep learning being too complex or structured datasets being too small) simply don’t hold water today. When we are assessing which machine learning approach to apply to a structured data problem, we need to keep an open mind and consider deep learning as a potential solution. Sec- ond, although nontabular data underpins many topical application areas of deep learning (such as image recognition, speech to text, and machine translation), our lives as consumers, employees, and citizens are still largely defined by data in tables. Every bank transaction, every tax payment, every insurance claim, and hundreds more aspects of our daily existence flow through structured, tabular data. Whether you are a
PREFACExvi newcomer to deep learning or an experienced practitioner, you owe it to yourself to have deep learning in your toolbox when you tackle a problem that involves struc- tured data. By reading this book, you will learn what you need to know to apply deep learning to a wide variety of structured data problems. You will work through a full-blown appli- cation of deep learning to a real-world dataset, from preparing the data to training the deep learning model to deploying the trained model. The code examples that accompany the book are written in Python, the lingua franca of machine learning, and take advantage of the Keras/TensorFlow framework, the most common platform for deep learning in industry.
xvii acknowledgments I have many people to thank for their support and assistance over the year and a half that I wrote this book. First, I would like to thank the team at Manning Publications, particularly my editor, Christina Taylor, for their masterful direction. I would like to thank my former supervisors at IBM—in particular Jessica Rockwood, Michael Kwok, and Al Martin—for giving me the impetus to write this book. I would like to thank my current team at Intact for their support—in particular Simon Marchessault-Groleau, Dany Simard, and Nicolas Beaupré. My friends have given me consistent encourage- ment. I would like to particularly thank Dr. Laurence Mussio and Flavia Mussio, both of whom have been unalloyed and enthusiastic supporters of my writing. Jamie Roberts, Luc Chamberland, Alan Hall, Peter Moroney, Fred Gandolfi, and Alina Zhang have all provided encouragement. Finally, I would like to thank my family—Steve and Carol, John and Debby, and Nina—for their love. (“We’re a literary family, thank God.”) To all the reviewers: Aditya Kaushik, Atul Saurav, Gary Bake, Gregory Matuszek, Guy Langston, Hao Liu, Ike Okonkwo, Irfan Ullah, Ishan Khurana, Jared Wadsworth, Jason Rendel, Jeff Hajewski, Jesús Manuel López Becerra, Joe Justesen, Juan Rufes, Julien Pohie, Kostas Passadis, Kunal Ghosh, Malgorzata Rodacka, Matthias Busch, Michael Jensen, Monica Guimaraes, Nicole Koenigstein, Rajkumar Palani, Raushan Jha, Sayak Paul, Sean T Booker, Stefano Ongarello, Tony Holdroyd, and Vlad Navitski, your suggestions helped make this a better book.
xviii about this book This book takes you through the full journey of applying deep learning to a tabular, structured dataset. By working through an extended, real-world example, you will learn how to clean up a messy dataset and use it to train a deep learning model by using the popular Keras framework. Then you will learn how to make your trained deep learning model available to the world through a web page or a chatbot in Face- book Messenger. Finally, you will learn how to extend and improve your deep learning model, as well as how to apply the approach shown in this book to other problems involving structured data. Who should read this book To get the most out of this book, you should be familiar with Python coding in the con- text of Jupyter Notebooks. You should also be familiar with some non-deep-learning machine learning approaches, such as logistic regression and support vector machines, and be familiar with the standard vocabulary of machine learning. Finally, if you regu- larly work with data that is organized in tables as rows and columns, you will find it easiest to apply the concepts in this book to your work. How this book is organized: A roadmap This book is made up of nine chapters and one appendix: ■ Chapter 1 includes a quick review of the high-level concepts of deep learning and a summary of why (and why not) you would want to apply deep learning to structured data. It also explains what I mean by structured data.
Comments 0
Loading comments...
Reply to Comment
Edit Comment