(This page has no text content)
Jason Hodson Applied Machine Learning Using Machine Learning to Solve Business Problems OceanofPDF.com
Imprint This e-book is a publication many contributed to, specifically: Editor Megan Fuerst Acquisitions Editor Hareem Shafi Copyeditor Rachel Gibson Cover Design Silke Braun Photo Credit iStockphoto: 1219460734/© andresr; Shutterstock: 1704315859/© shuttersv Production E-Book Kelly O’Callaghan Typesetting E-Book III-satz, Germany We hope that you liked this e-book. Please share your feedback with us and read the Service Pages to find out how to contact us. The Library of Congress Cataloging-in-Publication Control Number for the printed edition is as follows: 2025053188 ISBN 978-1-4932-2758-7 (print) ISBN 978-1-4932-2759-4 (e-book) ISBN 978-1-4932-2760-0 (print and e-book) © 2026 by Rheinwerk Publishing Inc., Boston (MA) 1st edition 2026 OceanofPDF.com
Notes on Usage This e-book is protected by copyright. By purchasing this e-book, you have agreed to accept and adhere to the copyrights. You are entitled to use this e-book for personal purposes. You may print and copy it, too, but also only for personal use. Sharing an electronic or printed copy with others, however, is not permitted, neither as a whole nor in parts. Of course, making them available on the internet or in a company network is illegal as well. For detailed and legally binding usage conditions, please refer to the section Legal Notes.
Notes on the Screen Presentation You are reading this e-book in a file format (EPUB or Mobi) that makes the book content adaptable to the display options of your reading device and to your personal needs. That’s a great thing; but unfortunately not every device displays the content in the same way and the rendering of features such as pictures and tables or hyphenation can lead to difficulties. This e-book was optimized for the presentation on as many common reading devices as possible. If you want to zoom in on a figure (especially in iBooks on the iPad), tap the respective figure once. By tapping once again, you return to the previous screen. You can find more recommendations on the customization of the screen layout on the Service Pages. OceanofPDF.com
Table of Contents Notes on Usage Table of Contents Preface 1 Introduction 1.1 Aligning on Nomenclature 1.2 Learning to Google (or Prompt) 1.2.1 What Can You Find with Google? 1.2.2 Prompting 1.3 Predictions for Generative AI’s Impact on Machine Learning 1.4 Summary 2 Getting Started 2.1 GitHub
2.1.1 Creating an Account 2.1.2 GitHub in This Book 2.2 Anaconda 2.2.1 Creating an Account 2.2.2 Creating Projects and Uploading Data 2.2.3 Anaconda in This Book 2.3 Summary 3 Introduction to Our Use Cases 3.1 Importance of Understanding the Business Problem 3.1.1 Business Reviews 3.1.2 Definition of Success 3.2 Use Case 1: The Retail Tyrant 3.2.1 Details of the Request 3.2.2 History of the Request 3.2.3 Relationship with the Stakeholder 3.2.4 Use Case Questions 3.2.5 Use Case Answers 3.3 Use Case 2: Customer Retention 3.3.1 Details of the Request 3.3.2 History of the Request 3.3.3 Relationship with the Stakeholder 3.3.4 Use Case Questions 3.3.5 Use Case Answers
3.4 Use Case 3: Crime Predictions 3.4.1 Details of the Request 3.4.2 History of the Request 3.4.3 Relationship with the Stakeholder 3.4.4 Use Case Questions 3.4.5 Use Case Answers 3.5 Summary 4 Starting with the Data 4.1 Types of Data Sources 4.1.1 Manual 4.1.2 Automated 4.1.3 Data Sources for Our Use Cases 4.2 Data Exploration 4.2.1 Data Types 4.2.2 Data Visualization 4.2.3 Descriptive Statistics 4.2.4 Correlation Analysis 4.3 Data Cleaning (For Now) 4.3.1 Why Isn’t Data Already Clean? 4.3.2 Overview of Cleaning for Regression Models 4.3.3 Inaccurate Data 4.3.4 Missing Data 4.3.5 Dummy Coding 4.3.6 Dimensionality Reduction
4.4 Summary 5 Picking Your Model 5.1 The Simpler the Model, the Better 5.2 Model Decision Framework 5.2.1 How Important Is Interpretability? 5.2.2 How Many Rows and Columns? 5.2.3 What Is Being Predicted? 5.3 Train-Test Split 5.4 Regression Models 5.4.1 What Are Regression Models? 5.4.2 Multicollinearity 5.4.3 Linear Regression 5.4.4 Logistic Regression 5.5 Machine Learning Models 5.5.1 Decision Tree 5.5.2 Random Forest 5.5.3 Gradient Boosting Machine 5.6 Clustering 5.6.1 What Is Clustering? 5.6.2 Picking the Number of Clusters 5.6.3 Behind the Scenes of Clustering 5.7 Summary
6 Evaluating the Model and Iterating 6.1 Importance of Picking Validation Metrics 6.2 Validation Metrics 6.2.1 Accuracy 6.2.2 Confusion Matrix 6.2.3 Precision 6.2.4 Recall 6.2.5 F1 Score 6.2.6 Area Under the Curve 6.2.7 R-Squared 6.2.8 Mean Squared Error 6.2.9 Mean Absolute Error 6.2.10 Metric Summary 6.3 K-Fold Cross-Validation 6.4 Business Validations 6.4.1 Legal Considerations 6.4.2 Ethical Considerations 6.5 Machine Learning Interpretability 6.5.1 Regression Models 6.5.2 Tree-Based Models 6.6 Iterating on the Model 6.6.1 Feature Engineering 6.6.2 Remove Variables 6.6.3 Add New Data
6.7 Application to Use Cases 6.7.1 Use Case 1 6.7.2 Use Case 2 6.7.3 Use Case 3 6.8 Summary 7 Implementing, Monitoring, and Measuring the Model 7.1 Implementing Your Model for Predictions 7.1.1 Don’t Train the Model Each Time 7.1.2 Predictions for Our Use Cases 7.1.3 Saving Your Predictions 7.1.4 Practical Approaches to Consider 7.2 Model Monitoring 7.2.1 Importance of Model Monitoring 7.2.2 What to Monitor 7.2.3 Considerations for Model Monitoring 7.2.4 Retraining the Model 7.3 Measuring the Impact of Your Model 7.3.1 Business Sniff Test 7.3.2 Experiments 7.4 Summary
8 Closing Thoughts 8.1 Learning How to Learn with Generative AI 8.2 Learning How to Learn with Use Cases 8.3 Explore and Visualize Your Data 8.4 Cleaning Your Data and Dummy Coding 8.5 Machine Learning Models 8.6 Hyperparameters and Grid Search 8.7 Variable Lagging 8.8 The End 8.9 Acknowledgments The Author Index Service Pages Legal Notes OceanofPDF.com
Preface As I started my learning journey in the analytics space, I found a notable gap in the available resources. Introductory content is often too narrow in focus, and much of the existing material is not targeted toward early-stage learners. An important component of learning about analytics—and machine learning in particular—is developing a mental model of the connections among topics and themes. Therefore, this book is designed to help you understand which capabilities you should learn and how they fit together. My goal with the following chapters is to give you a condensed master’s-level class in machine learning that you can apply immediately. I’ve intentionally incorporated storytelling components to make the content engaging. Technical material can be dry, so my hope is that these stories help you work through and retain the information! Who Is This Book For? If you’re already an expert in machine learning with experience using machine learning algorithms on the job, then this isn’t the book for you. That said, this book is also not a comprehensive guide to machine learning. It focuses on explaining the practical applications of machine learning, creating a foundation for the end-to-end machine learning process.
The content in this book assumes that you have some basic knowledge of Python, which is the only main prerequisite. For example, understanding the general syntax of pandas will be helpful. If you don’t have any experience with Python, I’d recommend consulting resources that best fit your learning style. A great introductory option is W3 Schools (https://www.w3schools.com/), where you can find high-level, Python-specific learning content for free. This book was written with three primary personas in mind: Analyst with some coding experience Undergraduate student looking for applied experience Nontechnical leader of an analytics team Analyst with Some Coding Experience Across my various corporate roles at large companies like Walmart and Allstate, I’ve seen this persona in various departments, domains, and career stages. Data analytics careers tend to attract curious self- learners. However, the sheer amount of content on what’s required for a successful data career can be incredibly overwhelming (speaking from personal experience). One recommendation to navigate this complexity is to find a technical mentor; see the upcoming text box. Even with a technical mentor, I found it incredibly challenging to go from being a data analyst who knew how to write Python code to understanding how to properly leverage machine learning in my job. I was slowly building my knowledge with each capability in the process, but I didn’t understand how everything fit together, which made it really challenging to understand the “why” behind what I was
learning. It wasn’t until my master’s course in machine learning that all of the concepts began to fit together in my mental model. I understand it’s not always practical to invest the money and time needed to get a master’s degree. If you fit into this persona, view this book as a fast and significantly discounted mini-master’s class in machine learning. As a successful data analyst, you already know how important understanding the business context is to effectively fulfill your role. The same is true when leveraging machine learning. Understanding the business problem and the data is more important than mastering the machine learning algorithms. You can build the best machine learning model known to man, but if it doesn’t answer the business problem, it’s useless! Practical Advice: Find a Technical Mentor I personally benefited from a number of technical mentors who were more than gracious with their time. If you’re early in your technical career, I recommend identifying someone who can help you on your journey. Ideally, this would be an individual within your organization, as they’ll be more familiar with your company’s data and technology. If you can’t find a mentor at your company, look for someone with similar experience on a platform like LinkedIn. I think you’ll find that people are happy to help! Undergraduate Student Looking for Applied Experience A common critique of entry-level analysts is that they don’t understand the business application of machine learning (more on this soon). The use cases in this book serve as practical experience you can add to your code portfolio, providing you with talking points
about applying machine learning to realistic business cases that you can use when interviewing for jobs. This also enables you to showcase an end-to-end machine learning example. The use cases in this book span from understanding the initial business problem to putting a model to use. This adds to your interview talking points and can help differentiate you from others interviewing for entry-level data science or data analyst roles. Working with Entry-Level Data Scientists A consistent observation I have when working with junior data scientists is their interest in the technical problem, not the business problem. (If you’re one of these individuals, don’t take this observation the wrong way!) If you’re in a research role in the data science and artificial intelligence (AI) field, this may be more appropriate. However, if you’re working in a corporate setting, you must be interested in the business problem because that’s where your role adds value. I once worked with a junior data scientist who was incredible with Python and thoroughly understood the principles of machine learning and predictive modeling. However, their ability to understand the business problem they were trying to solve was a persistent challenge and they needed to be assisted continuously by senior members of the team. While junior employees aren’t expected to know everything off the bat, it’s important to combine your technical skills with an understanding of the bigger picture if you want to be successful in your career.
Nontechnical Leader of an Analytics Team Many companies leverage rotational programs to give their leaders a well-rounded perspective. If you’re someone without a technical background finding yourself leading an analytics team, you can view this book as a crash course on the machine learning process. The code samples are likely optional for you. If you don’t have any previous coding experience, now is not the time to learn how to write Python code. If you do have some previous coding experience, running through the code may give you some additional street cred with your new team and can help you build trust with them. Maureen Kalas, a former data science leader at Allstate, is a favorite leader of mine, and she fits this persona. She was often my inspiration for making this content accessible to other nontechnical leaders leading an analytics team. Here is a note from Maureen. Advice for Team Leaders Occasionally, business leaders take on stretch assignments, finding themselves leading teams with skills far different from their own. You may be one of these leaders, experienced in a particular business area with an appreciation for the power of analytics, which has now led you to your new role. Perhaps you’ll even lead the analytics team charged with solving the very business problems you faced before. This can be a perfect match! The learning journey for a nontechnical analytics leader looks different from the path most data science leaders follow. You already have the business experience you need, but now you must determine just how much technical knowledge is required for you to be a successful leader. No one will look to you to code or to train junior analysts. However, you have three key responsibilities
to your business partners, all of which require some level of technical knowledge: Understand the business opportunity and whether machine learning can help solve a critical problem. Understand how the machine learning model will work in the real world. Build trust and confidence in the solution with the business. Business partners tend to be rightfully skeptical of new products. The nontechnical leader has an advantage in speaking the language of the business partner and understanding business processes. However, as the analytics leader, you’ll be challenged on every aspect of the technical solution you propose—and if you can’t sell your product, you’ll lose your business partner’s commitment. So, what’s the right level of technical knowledge to acquire? Your job will be to ask good questions—both of your analysts, to guide development of the model and help them understand the business problem they are trying to solve, and of your business partners, to help them understand the benefits and risks of the technical solution. Your goals should include understanding the data, learning how the model works at a high level, and, most importantly, knowing what will trouble business leaders most about the solution. Where can you gain this knowledge? I recommend starting with a resource that brings together the complex topics of analytics and machine learning, identifies common themes, and illustrates how it all fits together. This book masterfully guides you through the most important machine learning concepts and can serve as a springboard to go deeper when you’re ready for the next level.
Take advantage of free resources to supplement what you learn here (including a refresher in statistics!), and of course, work with your team to better understand your specific applications. The Structure of This Book This book is divided into eight chapters. The first three contain introductory material to get you acquainted with machine learning and the use cases discussed throughout the book. Then, in Chapter 4 through Chapter 7, we dive into the technical content. The final chapter concludes the material and offers key takeaways. Here is an overview of each chapter: Chapter 1: Introduction In this introductory chapter, we explain the value of applied machine learning and cover relevant topics such as learning how to troubleshoot code and understanding the impact of generative AI. Chapter 2: Getting Started In this chapter, we walk you through setting up the accounts you’ll need to follow along with the book’s content. This includes a free account with Anaconda, a cloud-based platform for writing and running code. Chapter 3: Introduction to Our Use Cases This chapter is, well, exactly what the title states. To help you understand the fundamentals of machine learning, the book employs three fictional use cases, which we introduce in this chapter. Each use case aligns to a specific dataset.
Chapter 4: Starting with the Data This chapter is where we begin the focus of this book. We cover the most important concepts for preparing data for a machine learning model and apply them to the three use cases and their datasets. Chapter 5: Picking Your Model This chapter contains the core machine learning content of the book. We explore some of the most common models and work through examples and related considerations for each algorithm. After this chapter, you’ll have a hands-on understanding of how these algorithms work. We don’t dive headfirst into the statistics and math behind these models; instead, we focus on the practical components you should be aware of. Chapter 6: Evaluating the Model and Iterating This chapter focuses on error metrics and ways to interpret a machine learning model’s output. The error metrics are our way to keep a model in line with what we’re trying to get it to do. While this chapter falls more on the math side of this book’s content, it’s a critical component for building effective models. We’ll also cover how to interpret a machine learning model, which is incredibly useful in practice and also a fun peek behind the curtain of the model. Chapter 7: Implementing, Monitoring, and Measuring the Model This chapter focuses on generating predictions from your model, as well as steps to consider after your model is built. This includes monitoring your model as well as measuring its impact from a business perspective. Chapter 8: Closing Thoughts This chapter wraps the content in a bow and highlights key
Comments 0
Loading comments...
Reply to Comment
Edit Comment