Statistics
51
Views
0
Downloads
0
Donations
Uploader

高宏飞

Shared on 2025-12-13
Support
Share

AuthorGautam Kunapuli

This book was never intended to be just a tutorial with step-by-step instructions and cut-and-paste code (although you can use it that way, too). There are dozens of suchfantastic tutorials on the web, and they can get you going on your data set in an instant. Instead, I talk about each new method using an immersive approach inspired by that first machine-learning paper I ever read and refined in college classrooms during my time as a graduate lecturer. I’ve always felt that to understand a technical topic deeply, it helps to strip it down, take it apart, and try to put it back together again. I adopt the same approach in this book: we’ll take ensemble methods apart and (re)create them ourselves. We’ll tweak them and poke them to see how they change. And, in doing so, we’ll see exactly what makes them tick! I hope this book will be helpful in demystifying those technical and algorithmic details and get you into the ensemble mindset, be it for your class project, Kaggle competition, or production-quality application.

Tags
No tags
Publisher: Manning Publications Co.
Publish Year: 2023
Language: 英文
Pages: 352
File Format: PDF
File Size: 24.7 MB
Support Statistics
¥.00 · 0times
Text Preview (First 20 pages)
Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

M A N N I N G Gautam Kunapuli
2 EPILOGUE A taxonomy of ensembling techniques covered in this book PARALLEL HOMOGENEOUS ENSEMBLES Use many strong learners, or complex models, trained using the same base machine-learning algorithm. Ensemble diversity is created from a single algorithm with random data or feature sampling for training each base model. Ensembles in this family: bagging, random forests, pasting, random subspaces, random patches, extremely randomized trees (Extra Trees) Decision tree 1 Decision tree 3Decision tree 2 Ensemble prediction PARALLEL HETEROGENEOUS ENSEMBLES Also use many strong learners, but each trained using a different base machine-learning algorithm. Ensemble diversity is created by using multiple training algorithms on the same data set and combining learners with different types of prediction aggregation. Ensembles in this family: majority voting, entropy- based prediction weighting, Dempster-Shafer prediction fusion, meta-learning for stacking and blending. Logistic regression Decision tree Multilayer perceptron Ensemble prediction Shallow decision tree 1 Shallow decision tree 3 Shallow decision tree 2 Ensemble prediction Shallow decision tree 4 Shallow decision tree 5 SEQUENTIAL ADAPTIVE BOOSTING ENSEMBLES Use many weak learners, or simple models, trained in a stage-wise, sequential manner. Each successive model is trained to fix the mistakes made by the previously trained model, allowing the ensemble to adapt during training. The predictions of a large number of weak models are boosted into a strong model! Ensembles in this family: AdaBoost, LogitBoost Shallow decision tree 1 Shallow decision tree 3 Shallow decision tree 2 Ensemble prediction Shallow decision tree 4 Shallow decision tree 5 + + + + + SEQUENTIAL GRADIENT BOOSTING ENSEMBLES Also use many weak learners trained in a stage-wise manner to emulate gradient descent over the task- specific loss function. Each successive model is trained to fit the residuals, or example-wise losses, of the previously trained model. Thus, each ensemble component is both an approximate gradient and a weak learner! Ensembles in this family: gradient boosting and LightGBM, Newton boosting and XGBoost, ordered boosting and CatBoost, explainable boosting models + + + + + +
(This page has no text content)
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2023 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. The author and publisher have made every effort to ensure that the information in this book was correct at press time. The author and publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause, or from any usage of the information herein. Manning Publications Co. Development editors: Katherine Olstein 20 Baldwin Road and Karen Miller PO Box 761 Technical development editor: Alain Couniot Shelter Island, NY 11964 Review editor: Mihaela Batinic Production editor: Kathleen Rossland Copy editor: Julie McNamee Proofreader: Katie Tennant Technical proofreader: Manish Jain Typesetter and cover designer: Marija Tudor ISBN 9781617297137 Printed in the United States of America
To my cousin Bhima, who inspired me to board a plane and go far away from home, who made grad school look glamorous (it wasn’t, but was worth it), without whose example, my own journey would have been very different, and this book would probably not exist. Wish you were here.
contents preface xi acknowledgments xiii about this book xv about the author xix about the cover illustration xx PART 1 THE BASICS OF ENSEMBLES ............................... 1 1 Ensemble methods: Hype or hallelujah? 3 1.1 Ensemble methods: The wisdom of the crowds 5 1.2 Why you should care about ensemble learning 6 1.3 Fit vs. complexity in individual models 9 Regression with decision trees 9 ■ Regression with support vector machines 13 1.4 Our first ensemble 16 1.5 Terminology and taxonomy for ensemble methods 20 PART 2 ESSENTIAL ENSEMBLE METHODS ..................... 23 2 Homogeneous parallel ensembles: Bagging and random forests 25 2.1 Parallel ensembles 26vi
CONTENTS vii2.2 Bagging: Bootstrap aggregating 28 Intuition: Resampling and model aggregation 29 Implementing bagging 31 ■ Bagging with scikit-learn 33 Faster training with parallelization 35 2.3 Random forests 36 Randomized decision trees 36 ■ Random forests with scikit- learn 38 ■ Feature importances 39 2.4 More homogeneous parallel ensembles 40 Pasting 40 ■ Random subspaces and random patches 41 Extra Trees 42 2.5 Case study: Breast cancer diagnosis 43 Loading and preprocessing 43 ■ Bagging, random forests, and Extra Trees 44 ■ Feature importances with random forests 47 3 Heterogeneous parallel ensembles: Combining strong learners 51 3.1 Base estimators for heterogeneous ensembles 53 Fitting base estimators 54 ■ Individual predictions of base estimators 56 3.2 Combining predictions by weighting 59 Majority vote 60 ■ Accuracy weighting 61 ■ Entropy weighting 64 ■ Dempster-Shafer combination 66 3.3 Combining predictions by meta-learning 68 Stacking 69 ■ Stacking with cross validation 73 3.4 Case study: Sentiment analysis 76 Preprocessing 77 ■ Dimensionality reduction 80 ■ Blending classifiers 81 4 Sequential ensembles: Adaptive boosting 86 4.1 Sequential ensembles of weak learners 87 4.2 AdaBoost: Adaptive boosting 89 Intuition: Learning with weighted examples 90 ■ Implementing AdaBoost 93 ■ AdaBoost with scikit-learn 98 4.3 AdaBoost in practice 100 Learning rate 101 ■ Early stopping and pruning 103 4.4 Case study: Handwritten digit classification 105 Dimensionality reduction with t-SNE 106 ■ Boosting 107
CONTENTSviii4.5 LogitBoost: Boosting with the logistic loss 111 Logistic vs. exponential loss functions 111 ■ Regression as a weak learning algorithm for classification 112 ■ Implementing LogitBoost 112 5 Sequential ensembles: Gradient boosting 116 5.1 Gradient descent for minimization 118 Gradient descent with an illustrative example 119 ■ Gradient descent over loss functions for training 124 5.2 Gradient boosting: Gradient descent + boosting 127 Intuition: Learning with residuals 128 ■ Implementing gradient boosting 132 ■ Gradient boosting with scikit-learn 137 Histogram-based gradient boosting 138 5.3 LightGBM: A framework for gradient boosting 140 What makes LightGBM “light”? 140 ■ Gradient boosting with LightGBM 142 5.4 LightGBM in practice 143 Learning rate 143 ■ Early stopping 146 ■ Custom loss functions 148 5.5 Case study: Document retrieval 151 The LETOR data set 152 ■ Document retrieval with LightGBM 153 6 Sequential ensembles: Newton boosting 157 6.1 Newton’s method for minimization 158 Newton’s method with an illustrative example 161 Newton’s descent over loss functions for training 165 6.2 Newton boosting: Newton’s method + boosting 168 Intuition: Learning with weighted residuals 168 ■ Intuition: Learning with regularized loss functions 171 ■ Implementing Newton boosting 174 6.3 XGBoost: A framework for Newton boosting 179 What makes XGBoost “extreme”? 179 ■ Newton boosting with XGBoost 181 6.4 XGBoost in practice 183 Learning rate 183 ■ Early stopping 185 6.5 Case study redux: Document retrieval 186 The LETOR data set 187 ■ Document retrieval with XGBoost 187
CONTENTS ixPART 3 ENSEMBLES IN THE WILD: ADAPTING ENSEMBLE METHODS TO YOUR DATA ............. 191 7 Learning with continuous and count labels 193 7.1 A brief review of regression 194 Linear regression for continuous labels 195 ■ Poisson regression for count labels 200 ■ Logistic regression for classification labels 204 ■ Generalized linear models 206 ■ Nonlinear regression 207 7.2 Parallel ensembles for regression 210 Random forests and Extra Trees 211 ■ Combining regression models 213 ■ Stacking regression models 215 7.3 Sequential ensembles for regression 216 Loss and likelihood functions for regression 217 ■ Gradient boosting with LightGBM and XGBoost 220 7.4 Case study: Demand forecasting 223 The UCI Bike Sharing data set 223 ■ GLMs and stacking 226 Random forest and Extra Trees 229 ■ XGBoost and LightGBM 230 8 Learning with categorical features 235 8.1 Encoding categorical features 237 Types of categorical features 237 ■ Ordinal and one-hot encoding 238 ■ Encoding with target statistics 241 The category_encoders package 246 8.2 CatBoost: A framework for ordered boosting 249 Ordered target statistics and ordered boosting 249 ■ Oblivious decision trees 251 ■ CatBoost in practice 252 8.3 Case study: Income prediction 255 Adult Data Set 255 ■ Creating preprocessing and modeling pipelines 258 ■ Category encoding and ensembling 261 Ordered encoding and boosting with CatBoost 263 8.4 Encoding high-cardinality string features 265 9 Explaining your ensembles 271 9.1 What is interpretability? 272 Black-box vs. glass-box models 273 ■ Decision trees (and decision rules) 275 ■ Generalized linear models 278
CONTENTSx9.2 Case study: Data-driven marketing 281 Bank Marketing data set 281 ■ Training ensembles 283 Feature importances in tree ensembles 285 9.3 Black-box methods for global explainability 286 Permutation feature importance 286 ■ Partial dependence plots 289 ■ Global surrogate models 292 9.4 Black-box methods for local explainability 296 Local surrogate models with LIME 296 ■ Local interpretability with SHAP 300 9.5 Glass-box ensembles: Training for interpretability 306 Explainable boosting machines 307 ■ EBMs in practice 309 epilogue 315 Further reading 316 Practical ensemble methods 316 ■ Theory and foundations of ensemble methods 316 A few more advanced topics 316 Ensemble methods for statistical relational learning 317 Ensemble methods for deep learning 318 Thank you! 318 index 319
preface Once upon a time, I was a graduate student, adrift and rudderless in an ocean of unfulfilling research directions and uncertain futures. Then I stumbled upon a remarkable article titled “Support Vector Machines: Hype or Hallelujah?” This being the early 2000s, support vector machines (SVMs) were, of course, the preeminent machine-learning technique of the time. In the article, the authors (one of whom would later become my PhD advisor) took a rather reductionist approach to explaining the considerably complex topic of SVMs, interleaving intuition and geometry with theory and application. The article made a powerful impression on me, at once igniting a lifelong fascination with machine learning and an obsession with understanding how such methods work under the hood. Indeed, the title of the first chapter pays homage to that paper that had so pro- found an influence over my life. Much like SVMs then, ensemble methods are widely considered a preeminent machine-learning technique today. But what many people don’t realize is that some ensemble method or another has always been considered state of the art over the decades: bagging in the 1990s, random forests and boosting in the 2000s, gradient boosting in the 2010s, and XGBoost in the 2020s. In the ever-mutable world of the best machine-learning models, ensemble methods, it seems, are indeed worth the hype. I’ve been fortunate to spend a good deal of the past decade training many kinds of ensemble models, making industry applications out of them, and writing academic research papers on them. In this book, I try to showcase as many of these ensemble methods as possible: some that you’ve definitely heard of and some new ones that you should really hear about. This book was never intended to be just a tutorial with step-by-step instructions and cut-and-paste code (although you can use it that way, too). There are dozens of suchxi
PREFACExiifantastic tutorials on the web, and they can get you going on your data set in an instant. Instead, I talk about each new method using an immersive approach inspired by that first machine-learning paper I ever read and refined in college classrooms during my time as a graduate lecturer. I’ve always felt that to understand a technical topic deeply, it helps to strip it down, take it apart, and try to put it back together again. I adopt the same approach in this book: we’ll take ensemble methods apart and (re)create them ourselves. We’ll tweak them and poke them to see how they change. And, in doing so, we’ll see exactly what makes them tick! I hope this book will be helpful in demystifying those technical and algorithmic details and get you into the ensemble mindset, be it for your class project, Kaggle competition, or production-quality application.
acknowledgments I never thought that a book on ensemble methods would itself turn into an ensemble effort of family and friends, colleagues, and collaborators, all of whom had a lot to do with this book, from conception to completion. To Brian Sawyer, who let me pitch the idea of this book, for believing in this proj- ect, for being patient, and for keeping me on track: thank you for giving me this opportunity to do this thing that I’ve always wanted to do. To my first development editor, Katherine Olstein, second development editor, Karen Miller, and technical development editor, Alain Couniot: I had a vision for what this book would look like when I started, and you helped make it better. Thank you for the hours and days of meticulous reviews, for your eagle-eyed edits, and for chal- lenging me always to be a better writer. Your efforts have much to do with the final quality of this book. To Manish Jain: thank you for painstakingly proofreading the code line by line. To Marija Tudor: thank you for designing this absolutely fantastic cover (which I still think is the best part of this book), for making it orange at my request, and for typeset- ting it from cover to cover. To the proofing and production team at Manning: thank you for your exceptional craft—this book looks perfect—review editor Mihaela Bati- nic, production editor Kathleen Rossland, copy editor Julie McNamee, and proof- reader Katie Tennant. To my reviewers, Al Krinker, Alain Lompo, Biswanath Chowdhury, Chetan Saran Mehra, Eric Platon, Gustavo A. Patino, Joaquin Beltran, Lucian Mircea Sasu, Manish Jain, McHugson Chambers, Ninoslav Cerkez, Noah Flynn, Oliver Korten, Or Golan, Peter V. Henstock, Philip Best, Sergio Govoni, Simon Seyag, Stephen John Warnett, Subhash Talluri, Todd Cook, and Xiangbo Mao: thank you for your fabulous feedbackxiii
ACKNOWLEDGMENTSxivand some truly terrific insights and comments. I tried to take in all of your advice (I really did), and much of it has worked its way into the book. To the readers who read the book during early access and who left many com- ments, corrections, and words of encouragement—you know who you are—thank you for the support! To my mentors, Kristin Bennett, Jong-Shi Pang, Jude Shavlik, Sriraam Natarajan, and Maneesh Singh, who have each shaped my thinking profoundly at different stages of my journey as a student, postdoc, professor, and professional: thank you for teach- ing me how to think in machine learning, how to speak machine learning, and how to build with machine learning. Much of your wisdom and many of your lessons endure in this book. And Kristin, I hope you like the title of the first chapter. To Jenny and Guilherme de Oliveira, for your friendship over the years, but espe- cially during the great pandemic, when much of this book was written: thank you for keeping me sane. I will always treasure our afternoons and evenings in that summer and fall of 2020, tucked away in your little backyard, our pod and sanctuary. To my parents, Vijaya and Shivakumar, and my brother, Anupam: thank you for always believing in me, and for always supporting me, even from tens of thousands of miles away. I know you’re proud of me. This book is finally finished, and now we can do all those other things we’re always talking about . . . until I start writing the next one, anyway. To my wife, best friend, and biggest champion, Kristine: you’ve been an inexhaust- ible source of comfort and encouragement, especially when things got tough. Thank you for bouncing ideas with me, for proofreading with me, for the tea and snacks, for the Gus, for sacrificing all those weekends (and, sometimes, weeknights) when I was writing. Thank you for hanging in there with me, for always being there for me, and for never once doubting that I could do this. I love you!
about this book There has never been a better time to learn about ensemble methods. The models covered in this book fall into three broad categories:  Foundational ensemble methods—The classics that everyone has heard of, including historical ensemble techniques such as bagging, random forests, and AdaBoost  State-of-the-art ensemble methods—The tried and tested powerhouses of the mod- ern ensemble era that form the core of many real-world, in-production predic- tion, recommendation, and search systems  Emerging ensemble methods—The latest methods fresh out of the research found- ries to handle new needs and emerging priorities such as explainability and interpretability Each chapter will introduce a different ensembling technique, using a three-pronged approach. First, you’ll learn the intuition behind each ensemble method by visualizing step by step how learning actually takes place. Second, you’ll implement a basic version of each ensemble method yourself to fully understand the algorithmic nuts and bolts. Third, you’ll learn how to apply powerful ensemble libraries and tools practically. Most chapters also come with their own case study on real-world data, drawn from applications such as handwritten digit prediction, recommendation systems, sentiment analysis, demand forecasting, and others. These case studies tackle several real-world issues where appropriate, including preprocessing and feature engineering, hyperpa- rameter selection, efficient training techniques, and effective model evaluation. Who should read this book This book is intended for a broad audience:  Data scientists who are interested in using ensemble methods to get the best out of their data for real-world applicationsxv
ABOUT THIS BOOKxvi MLOps and DataOps engineers who are building, evaluating, and deploying ensemble-based, production-ready applications and pipelines  Students of data science and machine learning who want to use this book as a learning resource or as a practical reference to supplement textbooks  Kagglers and data science enthusiasts who can use this book as an entry point into learning about the endless modeling possibilities with ensemble methods This book is not an introduction to machine learning and data science. This book assumes that you have some basic working knowledge of machine learning and that you’ve used or played around with at least one fundamental learning technique (e.g., decision trees). A basic working knowledge of Python is also assumed. Examples, visualizations, and chapter case studies all use Python and Jupyter Notebooks. Knowledge of other commonly used Python packages such as NumPy (for mathematical computations), pandas (for data manipulation), and Matplotlib (for visualization) is useful, but not necessary. In fact, you can learn how to use these packages through the examples and case studies. How this book is organized: A road map This book is organized into nine chapters in three parts. Part 1 is a gentle introduc- tion to ensemble methods, part 2 introduces and explains several essential ensemble methods, and part 3 covers advanced topics. Part 1, “The basics of ensembles,” introduces ensemble methods and why you should care about them. This part also contains a road map of ensemble methods cov- ered in the rest of the book:  Chapter 1 discusses ensemble methods and basic ensemble terminology. It also introduces the fit-versus-complexity tradeoff (or the bias-variance tradeoff, as it’s more formally called). You’ll build your very first ensemble in this chapter. Part 2, “Essential ensemble methods,” covers several important families of ensemble methods, many of which are considered “essential” and are widely used in real-world applications. In each chapter, you’ll learn how to implement different ensemble methods from scratch, how they work, and how to apply them to real-world problems:  Chapter 2 begins our journey with parallel ensemble methods, specifically, par- allel homogeneous ensembles. Ensemble methods covered include bagging, random forests, pasting, random subspaces, random patches, and Extra Trees.  Chapter 3 continues the journey with more parallel ensembles, but the focus in this chapter is on parallel heterogeneous ensembles. Ensemble methods cov- ered include combining base models by majority voting, combining by weight- ing, prediction fusion with Dempster-Shafer, and meta-learning by stacking.  Chapter 4 introduces another family of ensemble methods—sequential adap- tive ensembles—in particular, the fundamental concept of boosting many weak
ABOUT THIS BOOK xviimodels into one powerful model. Ensemble methods covered include Ada- Boost and LogitBoost.  Chapter 5 builds on the foundational concepts of boosting and covers another fundamental sequential ensemble method, gradient boosting, which combines gradient descent with boosting. This chapter discusses how we can train gradient-boosting ensembles with scikit-learn and LightGBM.  Chapter 6 continues to explore sequential ensemble methods with Newton boosting, an efficient and effective extension of gradient boosting that com- bines Newton’s descent with boosting. This chapter discusses how we can train Newton boosting ensembles with XGBoost. Part 3, “Ensembles in the wild: Adapting ensemble methods to your data,” shows you how to apply ensemble methods to many scenarios, including data sets with continu- ous and count-valued labels and data sets with categorical features. You’ll also learn how to interpret your ensembles and explain their predictions:  Chapter 7 shows how we can train ensembles for different types of regression problems and generalized linear models, where training labels are continuous- or count-valued. Parallel and sequential ensembles for linear regression, Poisson regression, gamma regression, and Tweedie regression are covered.  Chapter 8 identifies challenges in learning with nonnumeric features, specifi- cally, categorical features, and encoding schemes that will help us train effective ensembles for this kind of data. This chapter also discusses two important prac- tical issues: data leakage and prediction shift. Finally, we’ll see how to overcome these issues with ordered boosting and CatBoost.  Chapter 9 covers the newly emerging and very important topic of explainable AI from the perspective of ensemble methods. This chapter introduces the notion of explainability and why it’s important. Several common black-box explainability methods are also discussed, including permutation feature importance, partial dependence plots, surrogate methods, Locally Interpreta- ble Model-Agnostic Explanation, Shapley values, and SHapley Additive exPlana- tions. The glass-box ensemble method, explainable boosting machines, and the InterpretML package are also introduced.  The epilogue concludes our journey with additional topics for further explora- tion and reading. While most of the chapters in the book can reasonably be read in a standalone man- ner, chapters 7, 8, and 9 build on part 2 of the book. About the code All the code and examples in this book are written in Python 3. The code is organized into Jupyter Notebooks and is available in an online GitHub repository (https://github .com/gkunapuli/ensemble-methods-notebooks) and for download from the Manning website (www.manning.com/books/ensemble-methods-for-machine-learning). You
ABOUT THIS BOOKxviiican get executable snippets of code from the liveBook (online) version of this book at https://livebook.manning.com/book/ensemble-methods-for-machine-learning. Several Python scientific and visualization libraries are also used, including NumPy (https://numpy.org/), SciPy (https://scipy.org/), pandas (https://pandas.pydata .org/), and Matplotlib (https://matplotlib.org/). The code also uses several Python machine-learning and ensemble-method libraries, including scikit-learn (https:// scikit-learn.org/stable/), LightGBM (https://lightgbm.readthedocs.io/), XGBoost (https://xgboost.readthedocs.io/), CatBoost (https://catboost.ai/), and InterpretML (https://interpret.ml/). This book contains many examples of source code both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. Additionally, comments in the source code have often been removed from the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts. liveBook discussion forum Purchase of Ensemble Methods for Machine Learning includes free access to liveBook, Manning’s online reading platform. Using liveBook’s exclusive discussion features, you can attach comments to the book globally or to specific sections or paragraphs. It’s a snap to make notes for yourself, ask and answer technical questions, and receive help from the author and other users. To access the forum, go to https://livebook .manning.com/book/ensemble-methods-for-machine-learning/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook .manning.com/discussion. Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It’s not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the author some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.
about the author GAUTAM KUNAPULI has more than 15 years of experience in both academia and the machine-learning industry. His work focuses on human-in-the-loop learning, knowledge-based and advice-taking learning algorithms, and scalable learning for difficult machine- learning problems. Gautam has developed several novel algorithms for diverse application domains, including social network analysis, text and natural language processing, computer vision, behavior mining, educational data mining, insurance and financial analytics, and biomedical applications. He has also published papers exploring ensemble methods in relational domains and with imbalanced data. xix
about the cover illustration The figure on the cover of Ensemble Methods for Machine Learning is “Huonv ou Musiciene Chinoise,” or “Huonv or Chinese musician,” from a collection by Jacques Grasset de Saint-Sauveur, published in 1788. Each illustration is finely drawn and col- ored by hand. In those days, it was easy to identify where people lived and what their trade or sta- tion in life was just by their dress. Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional cul- ture centuries ago, brought back to life by pictures from collections such as this one. xx
The above is a preview of the first 20 pages. Register to read the complete e-book.