Digital Library

Building Machine Learning Pipelines Automating Model Life Cycles with TensorFlow (Hannes Hapke, Catherine Nelson) (Z-Library)

Hannes Hapke, Catherine Nelson

Building Machine Learning Pipelines Automating Model Life Cycles with TensorFlow (Hannes Hapke, Catherine Nelson) (Z-Library)

Author Hannes Hapke, Catherine Nelson

科学

Companies are spending billions on machine learning projects, but it's money wasted if the models can't be deployed effectively. In this practical guide, Hannes Hapke and Catherine Nelson walk you through the steps of automating a machine learning pipeline using the TensorFlow ecosystem. You'll learn the techniques and tools that will cut deployment time from days to minutes, so that you can focus on developing new models rather than maintaining legacy systems. Data scientists, machine learning engineers, and DevOps engineers will discover how to go beyond model development to successfully productize their data science projects, while managers will better understand the role they play in helping to accelerate these projects. The book also explores new approaches for integrating data privacy into machine learning pipelines. Understand the machine learning management lifecycle Implement data pipelines with Apache Airflow and Kubeflow Pipelines Work with data using TensorFlow tools like ML Metadata, TensorFlow Data Validation, and TensorFlow Transform Analyze models with TensorFlow Model Analysis and ship them with the TFX Model Pusher Component after the ModelValidator TFX Component confirmed that the analysis results are an improvement Deploy models in a variety of environments with TensorFlow Serving, TensorFlow Lite, and TensorFlow.js Learn methods for adding privacy, including differential privacy with TensorFlow Privacy and federated learning with TensorFlow Federated Design model feedback loops to increase your data sets and learn when to update your machine learning models

Format PDF

Size 15.7 MB

273

Views

Downloads

0.00

Total Donations

Read Online Download

Text Preview (First 20 pages)

Registered users can read the full content for free

Page 1

Building Machine Learning Pipelines Automating Model Life Cycles with TensorFlow Hannes Hapke & Catherine Nelson Foreword By Aurélien Géron

Page 2

(This page has no text content)

Page 3

Praise for Building Machine Learning Pipelines “I wish this book had existed when I started working in production ML! It’s an outstanding resource for getting a comprehensive view of production ML systems in general, and TFX in particular. Hannes and Catherine have worked directly with the TensorFlow team to get the most accurate information available for including in this book, and then explained it in clear, concise explanations and examples.” —Robert Crowe, TensorFlow Developer Advocate, Google “The data science practitioner knows that real-world machine learning involves more than just machine learning model training. This book demystifies the hidden technical debt in modern machine learning workflows such that you can put the lab and factory data science patterns into production as repeatable workflows.” —Josh Patterson, CEO, Patterson Consulting, Coauthor of Deep Learning: A Practitioner’s Approach and Kubeflow Operations Guide “This is definitely the book to read if you would like to understand how to build ML pipelines that are automated, scalable, and reproducible! You will learn something useful from it whether you are a data scientist, machine learning engineer, software engineer, or DevOps. It also covers the latest features of TFX and its components.” —Margaret Maynard-Reid, Machine Learning Engineer, Tiny Peppers, ML GDE (Google Developer Expert), GDG Seattle Lead Organizer

Page 4

“Wonderfully readable, Building Machine Learning Pipeline serves not only as a comprehensive guide to help data scientists and ML engineers build automated and reproducible ML pipelines, but it is also the only authoritative book on the subject. The book provides an overview of the clearly defined components needed to architect ML pipelines successfully and walks you through hands-on code examples in a practical manner." —Adewale Akinfaderin, Data Scientist, Amazon Web Services “I really enjoyed reading Building Machine Learning Pipelines. Having used TFX for several years internally at Google as it was growing, I must say I wish I had your book back then instead of figuring this all out on my own. You would have saved me many months of effort and confusion. Thanks for writing such a high quality guide!” —Lucas Ackerknecht, Machine Learning Specialist, Anti-Abuse Machine Learning, Google “We all have some of these amazing prototype models lying around. This book will introduce you to the tools and techniques that will help you take that prototype to production. Not only that but you will also build a complete end-to-end pipeline around it so that any future enhancements get delivered automatically and smoothly. This is a great book for beginners in ML ops who want to take their skills to the next level and collaborate with larger teams to help realize the values of innovative new models.” —Vikram Tiwari, Cofounder, Omni Labs, Inc. “As a person who had only used TensorFlow as a framework for training deep learning models, when reading this book I was amazed at the pipeline capabilities that the TensorFlow ecosystem has to offer. This book is a great guide to all of the tools for analyzing and deploying models available with TFX, and is easy to read and use for people looking to make their first machine learning pipeline with TensorFlow.” —Dr. Jacqueline Nolis, Principal Data Scientist, Brightloom and Coauthor of Build a Career in Data Science “This book is an exceptional deep-dive into Machine Learning Engineering. You will find cogent and practical examples of what it takes to build production-ready ML infrastructure. I would consider this required reading for any engineer or data scientist who intends to apply ML to real-world problems.” —Leigh Johnson, Staff Engineer, Machine Learning Services, Slack

Page 5

Hannes Hapke and Catherine Nelson Building Machine Learning Pipelines Automating Model Life Cycles with TensorFlow Boston Farnham Sebastopol TokyoBeijing

Page 6

978-1-492-05319-4 [LSI] Building Machine Learning Pipelines by Hannes Hapke and Catherine Nelson Copyright © 2020 Hannes Hapke and Catherine Nelson. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institu‐ tional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Jonathan Hassell Indexer: Ellen Troutman-Zaig Developmental Editors: Amelia Blevins, Nicole Tachè Interior Designer: David Futato Production Editor: Katherine Tozer Cover Designer: Karen Montgomery Copyeditor: Tom Sullivan Illustrator: Rebecca Demarest Proofreader: Piper Editorial, LLC August 2020: First Edition Revision History for the First Edition 2020-07-13: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781492053194 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Building Machine Learning Pipelines, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors, and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

Page 7

Table of Contents Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Why Machine Learning Pipelines? 1 When to Think About Machine Learning Pipelines 3 Overview of the Steps in a Machine Learning Pipeline 3 Data Ingestion and Data Versioning 4 Data Validation 4 Data Preprocessing 5 Model Training and Tuning 5 Model Analysis 6 Model Versioning 6 Model Deployment 7 Feedback Loops 7 Data Privacy 7 Pipeline Orchestration 8 Why Pipeline Orchestration? 8 Directed Acyclic Graphs 9 Our Example Project 10 Project Structure 10 Our Machine Learning Model 11 Goal of the Example Project 12 Summary 12 v

Page 8

2. Introduction to TensorFlow Extended. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 What Is TFX? 14 Installing TFX 16 Overview of TFX Components 16 What Is ML Metadata? 17 Interactive Pipelines 18 Alternatives to TFX 20 Introduction to Apache Beam 21 Setup 21 Basic Data Pipeline 22 Executing Your Basic Pipeline 25 Summary 25 3. Data Ingestion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Concepts for Data Ingestion 27 Ingesting Local Data Files 28 Ingesting Remote Data Files 34 Ingesting Data Directly from Databases 34 Data Preparation 36 Splitting Datasets 36 Spanning Datasets 38 Versioning Datasets 39 Ingestion Strategies 40 Structured Data 40 Text Data for Natural Language Problems 40 Image Data for Computer Vision Problems 41 Summary 42 4. Data Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Why Data Validation? 44 TFDV 45 Installation 46 Generating Statistics from Your Data 46 Generating Schema from Your Data 48 Recognizing Problems in Your Data 49 Comparing Datasets 50 Updating the Schema 52 Data Skew and Drift 52 Biased Datasets 54 Slicing Data in TFDV 55 vi | Table of Contents

Page 9

Processing Large Datasets with GCP 57 Integrating TFDV into Your Machine Learning Pipeline 60 Summary 62 5. Data Preprocessing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Why Data Preprocessing? 64 Preprocessing the Data in the Context of the Entire Dataset 64 Scaling the Preprocessing Steps 64 Avoiding a Training-Serving Skew 65 Deploying Preprocessing Steps and the ML Model as One Artifact 66 Checking Your Preprocessing Results in Your Pipeline 66 Data Preprocessing with TFT 67 Installation 68 Preprocessing Strategies 68 Best Practices 70 TFT Functions 70 Standalone Execution of TFT 73 Integrate TFT into Your Machine Learning Pipeline 75 Summary 78 6. Model Training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Defining the Model for Our Example Project 80 The TFX Trainer Component 83 run_fn() Function 83 Running the Trainer Component 88 Other Trainer Component Considerations 89 Using TensorBoard in an Interactive Pipeline 91 Distribution Strategies 92 Model Tuning 95 Strategies for Hyperparameter Tuning 95 Hyperparameter Tuning in TFX Pipelines 96 Summary 97 7. Model Analysis and Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 How to Analyze Your Model 100 Classification Metrics 100 Regression Metrics 103 TensorFlow Model Analysis 104 Analyzing a Single Model in TFMA 104 Analyzing Multiple Models in TFMA 107 Model Analysis for Fairness 109 Table of Contents | vii

Page 10

Slicing Model Predictions in TFMA 111 Checking Decision Thresholds with Fairness Indicators 112 Going Deeper with the What-If Tool 116 Model Explainability 119 Generating Explanations with the WIT 121 Other Explainability Techniques 122 Analysis and Validation in TFX 124 ResolverNode 124 Evaluator Component 125 Validation in the Evaluator Component 125 TFX Pusher Component 126 Summary 127 8. Model Deployment with TensorFlow Serving. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 A Simple Model Server 130 The Downside of Model Deployments with Python-Based APIs 131 Lack of Code Separation 132 Lack of Model Version Control 132 Inefficient Model Inference 132 TensorFlow Serving 133 TensorFlow Architecture Overview 133 Exporting Models for TensorFlow Serving 134 Model Signatures 135 Inspecting Exported Models 138 Setting Up TensorFlow Serving 140 Docker Installation 141 Native Ubuntu Installation 141 Building TensorFlow Serving from Source 142 Configuring a TensorFlow Server 142 REST Versus gRPC 147 Making Predictions from the Model Server 148 Getting Model Predictions via REST 148 Using TensorFlow Serving via gRPC 150 Model A/B Testing with TensorFlow Serving 152 Requesting Model Metadata from the Model Server 154 REST Requests for Model Metadata 154 gRPC Requests for Model Metadata 155 Batching Inference Requests 156 Configuring Batch Predictions 158 Other TensorFlow Serving Optimizations 159 TensorFlow Serving Alternatives 160 viii | Table of Contents

Page 11

BentoML 160 Seldon 160 GraphPipe 161 Simple TensorFlow Serving 161 MLflow 161 Ray Serve 162 Deploying with Cloud Providers 162 Use Cases 162 Example Deployment with GCP 163 Model Deployment with TFX Pipelines 168 Summary 169 9. Advanced Model Deployments with TensorFlow Serving. . . . . . . . . . . . . . . . . . . . . . . . 171 Decoupling Deployment Cycles 171 Workflow Overview 172 Optimization of Remote Model Loading 174 Model Optimizations for Deployments 175 Quantization 175 Pruning 176 Distillation 177 Using TensorRT with TensorFlow Serving 177 TFLite 178 Steps to Optimize Your Model with TFLite 179 Serving TFLite Models with TensorFlow Serving 180 Monitoring Your TensorFlow Serving Instances 181 Prometheus Setup 181 TensorFlow Serving Configuration 183 Simple Scaling with TensorFlow Serving and Kubernetes 185 Summary 187 10. Advanced TensorFlow Extended. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Advanced Pipeline Concepts 190 Training Multiple Models Simultaneously 190 Exporting TFLite Models 192 Warm Starting Model Training 194 Human in the Loop 195 Slack Component Setup 196 How to Use the Slack Component 197 Custom TFX Components 198 Use Cases of Custom Components 199 Writing a Custom Component from Scratch 199 Table of Contents | ix

Page 12

Reusing Existing Components 208 Summary 212 11. Pipelines Part 1: Apache Beam and Apache Airflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Which Orchestration Tool to Choose? 214 Apache Beam 214 Apache Airflow 214 Kubeflow Pipelines 214 Kubeflow Pipelines on AI Platform 215 Converting Your Interactive TFX Pipeline to a Production Pipeline 215 Simple Interactive Pipeline Conversion for Beam and Airflow 217 Introduction to Apache Beam 218 Orchestrating TFX Pipelines with Apache Beam 219 Introduction to Apache Airflow 220 Installation and Initial Setup 220 Basic Airflow Example 222 Orchestrating TFX Pipelines with Apache Airflow 225 Pipeline Setup 225 Pipeline Execution 227 Summary 228 12. Pipelines Part 2: Kubeflow Pipelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Introduction to Kubeflow Pipelines 230 Installation and Initial Setup 232 Accessing Your Kubeflow Pipelines Installation 234 Orchestrating TFX Pipelines with Kubeflow Pipelines 235 Pipeline Setup 237 Executing the Pipeline 241 Useful Features of Kubeflow Pipelines 247 Pipelines Based on Google Cloud AI Platform 252 Pipeline Setup 252 TFX Pipeline Setup 256 Pipeline Execution 260 Summary 261 13. Feedback Loops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Explicit and Implicit Feedback 264 The Data Flywheel 264 Feedback Loops in the Real World 265 Design Patterns for Collecting Feedback 268 Users Take Some Action as a Result of the Prediction 268 x | Table of Contents

Page 13

Users Rate the Quality of the Prediction 269 Users Correct the Prediction 269 Crowdsourcing the Annotations 270 Expert Annotations 270 Producing Feedback Automatically 271 How to Track Feedback Loops 271 Tracking Explicit Feedback 272 Tracking Implicit Feedback 272 Summary 272 14. Data Privacy for Machine Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Data Privacy Issues 275 Why Do We Care About Data Privacy? 276 The Simplest Way to Increase Privacy 276 What Data Needs to Be Kept Private? 277 Differential Privacy 277 Local and Global Differential Privacy 279 Epsilon, Delta, and the Privacy Budget 279 Differential Privacy for Machine Learning 280 Introduction to TensorFlow Privacy 280 Training with a Differentially Private Optimizer 281 Calculating Epsilon 282 Federated Learning 283 Federated Learning in TensorFlow 285 Encrypted Machine Learning 285 Encrypted Model Training 286 Converting a Trained Model to Serve Encrypted Predictions 287 Other Methods for Data Privacy 288 Summary 289 15. The Future of Pipelines and Next Steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Model Experiment Tracking 291 Thoughts on Model Release Management 292 Future Pipeline Capabilities 293 TFX with Other Machine Learning Frameworks 294 Testing Machine Learning Models 294 CI/CD Systems for Machine Learning 295 Machine Learning Engineering Community 295 Summary 295 A. Introduction to Infrastructure for Machine Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Table of Contents | xi

Page 14

B. Setting Up a Kubernetes Cluster on Google Cloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 C. Tips for Operating Kubeflow Pipelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 xii | Table of Contents

Page 15

Foreword When Henry Ford’s company built its first moving assembly line in 1913 to produce its legendary Model T, it cut the time it took to build each car from 12 to 3 hours. This drastically reduced costs, allowing the Model T to become the first affordable automobile in history. It also made mass production possible: soon, roads were floo‐ ded with Model Ts. Since the production process was now a clear sequence of well-defined steps (aka, a pipeline), it became possible to automate some of these steps, saving even more time and money. Today, cars are mostly built by machines. But it’s not just about time and money. For many repetitive tasks, a machine will pro‐ duce much more consistent results than humans, making the final product more pre‐ dictable, consistent, and reliable. Lastly, by keeping humans away from heavy machinery, safety is greatly improved, and many workers went on to perform higher- level jobs (although to be fair, many others just lost their jobs). On the flip side, setting up an assembly line can be a long and costly process. And it’s not ideal if you want to produce small quantities or highly customized products. Ford famously said, “Any customer can have a car painted any color that he wants, so long as it is black.” The history of car manufacturing has repeated itself in the software industry over the last couple of decades: every significant piece of software nowadays is typically built, tested, and deployed using automation tools such as Jenkins or Travis. However, the Model T metaphor isn’t sufficient anymore. Software doesn’t just get deployed and forgotten; it must be monitored, maintained, and updated regularly. Software pipe‐ lines now look more like dynamic loops than static production lines. It’s crucial to be Foreword | xiii

Page 16

able to quickly update the software (or the pipeline itself) without ever breaking it. And software is much more customizable than the Model T ever was: software can be painted any color (e.g., try counting the number of MS Office variants that exist). Unfortunately, “classical” automation tools are not well suited to handle a full machine learning pipeline. Indeed, an ML model is not a regular piece of software. For one, a large part of its behavior is driven by the data it trains on. Therefore, the training data itself must be treated as code (e.g., versioned). This is quite a tricky problem because new data pops up every day (often in large quantities), usually evolves and drifts over time, often includes private data, and must be labelled before you can feed it to supervised learning algorithms. Second, the behavior of a model is often quite opaque: it may pass all the tests on some data but fail entirely on others. So you must ensure that your tests cover all the data domains on which your model will be used in production. In particular, you must make sure that it doesn’t discriminate against a subset of your users. For these (and other) reasons, data scientists and software engineers first started building and training ML models manually, “in their garage,” so to speak, and many of them still do. But new automation tools have been developed in the past few years that tackle the challenges of ML pipelines, such as TensorFlow Extended (TFX) and Kubeflow. More and more organizations are starting to use these tools to create ML pipelines that automate most (or all) of the steps involved in building and training ML models. The benefits of this automation are mostly the same as for the car indus‐ try: save time and money; build better, more reliable, and safer models; and spend more time doing more useful tasks than copying data or staring at learning curves. However, building an ML pipeline is not trivial. So where should you start? Well, right here! In this book, Hannes and Catherine provide a clear guide to start automating your ML pipelines. As a firm believer in the hands-on approach, especially for such a tech‐ nical topic, I particularly enjoyed the way this book guides you step by step through a concrete example project from start to finish. Thanks to the many code examples and the clear, concise explanations, you should have your own ML pipeline up and run‐ ning in no time, as well as all the conceptual tools required to adapt these ML pipe‐ lines to your own use cases. I highly recommend you grab your laptop and actually try things out as you read; you will learn much faster. I first met Hannes and Catherine in October 2019 at the TensorFlow World confer‐ ence in Santa Clara, CA, where I was speaking on building ML pipelines using TFX. They were working on this book on the same topic, and we shared the same editor, so naturally we had a lot to talk about. Some participants in my course had asked very technical questions about TensorFlow Serving (which is part of TFX), and Hannes and Catherine had all the answers I was looking for. Hannes even kindly accepted my xiv | Foreword

Page 17

invitation to give a talk on advanced features of TensorFlow Serving at the end of my course on very short notice. His talk was a treasure trove of insights and helpful tips, all of which you will find in this book, along with many, many more. Now it’s time to start building professional ML pipelines! — Aurélien Géron Former YouTube Video Classification Team Lead Author of Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (O’Reilly) Auckland, New Zealand, June 18, 2020 Foreword | xv

Page 18

(This page has no text content)

Page 19

Preface Everybody’s talking about machine learning. It’s moved from an academic discipline to one of the most exciting technologies around. From understanding video feeds in self-driving cars to personalizing medications, it’s becoming important in every industry. While the model architectures and concepts have received a lot of attention, machine learning has yet to go through the standardization of processes that the soft‐ ware industry experienced in the last two decades. In this book, we’d like to show you how to build a standardized machine learning system that is automated and results in models that are reproducible. What Are Machine Learning Pipelines? During the last few years, the developments in the field of machine learning have been astonishing. With the broad availability of graphical processing units (GPUs) and the rise of new deep learning concepts like Transformers such as BERT, or Gen‐ erative Adversarial Networks (GANs) such as deep convolutional GANs, the number of AI projects has skyrocketed. The number of AI startups is enormous. Organiza‐ tions are increasingly applying the latest machine learning concepts to all kinds of business problems. In this rush for the most performant machine learning solution, we have observed a few things that have received less attention. We have seen that data scientists and machine learning engineers are lacking good sources of informa‐ tion for concepts and tools to accelerate, reuse, manage, and deploy their develop‐ ments. What is needed is the standardization of machine learning pipelines. Machine learning pipelines implement and formalize processes to accelerate, reuse, manage, and deploy machine learning models. Software engineering went through the same changes a decade or so ago with the introduction of continuous integration (CI) and continuous deployment (CD). Back in the day, it was a lengthy process to test and deploy a web app. These days, these processes have been greatly simplified by a few tools and concepts. Previously, the deployment of web apps required collaboration between a DevOps engineer and the software developer. Today, the app Preface | xvii

Page 20

can be tested and deployed reliably in a matter of minutes. Data scientists and machine learning engineers can learn a lot about workflows from software engineer‐ ing. Our intention with this book is to contribute to the standardization of machine learning projects by walking readers through an entire machine learning pipeline, end to end. From our personal experience, most data science projects that aim to deploy models into production do not have the luxury of a large team. This makes it difficult to build an entire pipeline in-house from scratch. It may mean that machine learning projects turn into one-off efforts where performance degrades after time, the data sci‐ entist spends much of their time fixing errors when the underlying data changes, or the model is not used widely. An automated, reproducible pipeline reduces the effort required to deploy a model. The pipeline should include steps that: • Version your data effectively and kick off a new model training run • Validate the received data and check against data drift • Efficiently preprocess data for your model training and validation • Effectively train your machine learning models • Track your model training • Analyze and validate your trained and tuned models • Deploy the validated model • Scale the deployed model • Capture new training data and model performance metrics with feedback loops This list leaves out one important point: choosing the model architecture. We assume that you already have a good working knowledge of this step. If you are getting started with machine or deep learning, these resources are a great starting point to familiarize yourself with machine learning: • Fundamentals of Deep Learning: Designing Next-Generation Machine Intelligence Algorithms, 1st edition by Nikhil Buduma and Nicholas Locascio (O’Reilly) • Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd edition by Aurélien Géron (O’Reilly) Who Is This Book For? The primary audience for the book is data scientists and machine learning engineers who want to go beyond training a one-off machine learning model and who want to successfully productize their data science projects. You should be comfortable with basic machine learning concepts and familiar with at least one machine learning framework (e.g., PyTorch, TensorFlow, Keras). The machine learning examples in xviii | Preface

The above is a preview of the first 20 pages. Register to read the complete e-book.

Support Author

0.00

Total Amount (¥)

Donation Count

Recommended for You

Loading recommended books...

Failed to load, please try again later

← Back to List