Deep Learning for the Life Sciences Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More (Bharath Ramsundar, Peter Eastman etc.) (Z-Library)

Name: Deep Learning for the Life Sciences Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More (Bharath Ramsundar, Peter Eastman etc.) (Z-Library)
Availability: InStock
Rating: 5 (345 reviews)
Author: Bharath Ramsundar, Peter Eastman, Patrick Walters, Vijay Pande

Author: Bharath Ramsundar, Peter Eastman, Patrick Walters, Vijay Pande

其他

Deep learning has already achieved remarkable results in many fields. Now it’s making waves throughout the sciences broadly and the life sciences in particular. This practical book teaches developers and scientists how to use deep learning for genomics, chemistry, biophysics, microscopy, medical analysis, and other fields. Ideal for practicing developers and scientists ready to apply their skills to scientific applications such as biology, genetics, and drug discovery, this book introduces several deep network primitives. You’ll follow a case study on the problem of designing new therapeutics that ties together physics, chemistry, biology, and medicine—an example that represents one of science’s greatest challenges. • Learn the basics of performing machine learning on molecular data • Understand why deep learning is a powerful tool for genetics and genomics • Apply deep learning to understand biophysical systems • Get a brief introduction to machine learning with DeepChem • Use deep learning to analyze microscopic images • Analyze medical scans using deep learning techniques • Learn about variational autoencoders and generative adversarial networks • Interpret what your model is doing and how it’s working

📄 File Format: PDF

💾 File Size: 24.2 MB

345

Views

137

Downloads

0.00

Total Donations

📖 Read Online ⬇️ Download

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

📄 Page 1

(This page has no text content)

📄 Page 2

(This page has no text content)

📄 Page 3

(This page has no text content)

📄 Page 4

Bharath Ramsundar, Peter Eastman, Patrick Walters, and Vijay Pande Deep Learning for the Life Sciences Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More Boston Farnham Sebastopol TokyoBeijing

📄 Page 5

978-1-492-03983-9 [LSI] Deep Learning for the Life Sciences by Bharath Ramsundar, Peter Eastman, Patrick Walters, and Vijay Pande Copyright © 2019 Bharath Ramsundar, Peter Eastman, Patrick Walters, and Vijay Pande. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Development Editor: Nicole Tache Acquisitions Editor: Mike Loukides Production Editor: Katherine Tozer Copyeditor: Rachel Head Proofreader: Zachary Corleissen Indexer: Ellen Troutman-Zaig Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Rebecca Demarest April 2019: First Edition Revision History for the First Edition 2019-03-27: First Release See http://bit.ly/deep-learning-life-science for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Deep Learning for the Life Sciences, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors, and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

📄 Page 6

Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 1. Why Life Science?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Why Deep Learning? 1 Contemporary Life Science Is About Data 2 What Will You Learn? 3 2. Introduction to Deep Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Linear Models 8 Multilayer Perceptrons 10 Training Models 13 Validation 15 Regularization 15 Hyperparameter Optimization 17 Other Types of Models 18 Convolutional Neural Networks 18 Recurrent Neural Networks 19 Further Reading 21 3. Machine Learning with DeepChem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 DeepChem Datasets 24 Training a Model to Predict Toxicity of Molecules 25 Case Study: Training an MNIST Model 32 The MNIST Digit Recognition Dataset 33 A Convolutional Architecture for MNIST 34 Conclusion 39 iii

📄 Page 7

4. Machine Learning for Molecules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 What Is a Molecule? 42 What Are Molecular Bonds? 44 Molecular Graphs 46 Molecular Conformations 47 Chirality of Molecules 48 Featurizing a Molecule 49 SMILES Strings and RDKit 49 Extended-Connectivity Fingerprints 50 Molecular Descriptors 51 Graph Convolutions 51 Training a Model to Predict Solubility 52 MoleculeNet 54 SMARTS Strings 54 Conclusion 57 5. Biophysical Machine Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Protein Structures 61 Protein Sequences 63 A Short Primer on Protein Binding 66 Biophysical Featurizations 67 Grid Featurization 68 Atomic Featurization 73 The PDBBind Case Study 73 PDBBind Dataset 73 Featurizing the PDBBind Dataset 77 Conclusion 81 6. Deep Learning for Genomics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 DNA, RNA, and Proteins 85 And Now for the Real World 87 Transcription Factor Binding 90 A Convolutional Model for TF Binding 90 Chromatin Accessibility 93 RNA Interference 96 Conclusion 99 7. Machine Learning for Microscopy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 A Brief Introduction to Microscopy 103 Modern Optical Microscopy 104 The Diffraction Limit 107 Electron and Atomic Force Microscopy 108 iv | Table of Contents

📄 Page 8

Super-Resolution Microscopy 110 Deep Learning and the Diffraction Limit? 112 Preparing Biological Samples for Microscopy 112 Staining 112 Sample Fixation 113 Sectioning Samples 114 Fluorescence Microscopy 115 Sample Preparation Artifacts 117 Deep Learning Applications 118 Cell Counting 118 Cell Segmentation 121 Computational Assays 126 Conclusion 126 8. Deep Learning for Medicine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Computer-Aided Diagnostics 129 Probabilistic Diagnoses with Bayesian Networks 131 Electronic Health Record Data 132 The Dangers of Large Patient EHR Databases? 135 Deep Radiology 136 X-Ray Scans and CT Scans 138 Histology 141 MRI Scans 142 Learning Models as Therapeutics 143 Diabetic Retinopathy 144 Conclusion 147 Ethical Considerations 147 Job Losses 148 Summary 149 9. Generative Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Variational Autoencoders 151 Generative Adversarial Networks 153 Applications of Generative Models in the Life Sciences 154 Generating New Ideas for Lead Compounds 155 Protein Design 155 A Tool for Scientific Discovery 156 The Future of Generative Modeling 156 Working with Generative Models 157 Analyzing the Generative Model’s Output 158 Conclusion 161 Table of Contents | v

📄 Page 9

10. Interpretation of Deep Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Explaining Predictions 165 Optimizing Inputs 169 Predicting Uncertainty 172 Interpretability, Explainability, and Real-World Consequences 176 Conclusion 177 11. A Virtual Screening Workflow Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Preparing a Dataset for Predictive Modeling 180 Training a Predictive Model 186 Preparing a Dataset for Model Prediction 191 Applying a Predictive Model 195 Conclusion 202 12. Prospects and Perspectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Medical Diagnosis 203 Personalized Medicine 205 Pharmaceutical Development 206 Biology Research 208 Conclusion 209 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 vi | Table of Contents

📄 Page 10

Preface In recent years, life science and data science have converged. Advances in robotics and automation have enabled chemists and biologists to generate enormous amounts of data. Scientists today are capable of generating more data in a day than their prede‐ cessors 20 years ago could have generated in an entire career. This ability to rapidly generate data has also created a number of new scientific challenges. We are no longer in an era where data can be processed by loading it into a spreadsheet and making a couple of graphs. In order to distill scientific knowledge from these datasets, we must be able to identify and extract nonobvious relationships. One technique that has emerged over the last few years as a powerful tool for identi‐ fying patterns and relationships in data is deep learning, a class of algorithms that have revolutionized approaches to problems such as image analysis, language transla‐ tion, and speech recognition. Deep learning algorithms excel at identifying and exploiting patterns in large datasets. For these reasons, deep learning has broad appli‐ cations across life science disciplines. This book provides an overview of how deep learning has been applied in a number of areas including genetics, drug discovery, and medical diagnosis. Many of the examples we describe are accompanied by code examples that provide a practical introduction to the methods and give the reader a starting point for future research and exploration. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, and keywords. vii

📄 Page 11

Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/deepchem/DeepLearningLifeSciences. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a signifi‐ cant amount of example code from this book into your product’s documentation does require permission. We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Deep Learning for the Life Sciences by Bharath Ramsundar, Peter Eastman, Patrick Walters, and Vijay Pande (O’Reilly). Copyright 2019 Bharath Ramsundar, Karl Leswing, Peter Eastman, and Vijay Pande, 978-1-492-03983-9.” viii | Preface

📄 Page 12

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. O’Reilly Online Learning For almost 40 years, O’Reilly has provided technology and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, conferences, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in- depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, please visit http://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://bit.ly/deep-lrng-for-life-science. To comment or ask technical questions about this book, send email to bookques‐ tions@oreilly.com. For more information about our books, courses, conferences, and news, see our web‐ site at http://www.oreilly.com. Find us on Facebook: http://facebook.com/oreilly Follow us on Twitter: http://twitter.com/oreillymedia Watch us on YouTube: http://www.youtube.com/oreillymedia Preface | ix

📄 Page 13

Acknowledgments We would like to thank Nicole Tache, our editor at O’Reilly, as well as the tech review‐ ers and beta reviewers for their valuable contributions to the book. In addtion, we would like to thank Karl Leswing and Zhenqin (Michael) Wu for their contributions to the code and Johnny Israeli for valuable advice on the genomics chapter. Bharath would like to thank his family for their support and encouragement during many long weekends and nights working on this book. Peter would like to thank his wife for her constant support, as well as the many col‐ leagues from whom he has learned so much about machine learning. Pat would like to thank his wife Andrea, and his daughters Alee and Maddy, for their love and support. He would also like to acknowledge past and present colleagues at Vertex Pharmaceuticals and Relay Therapeutics, from whom he has learned so much. Finally, we want to thank the DeepChem open source community for their encour‐ agement and support throughout this project. x | Preface

📄 Page 14

CHAPTER 1 Why Life Science? While there are many directions that those with a technical inclination and a passion for data can pursue, few areas can match the fundamental impact of biomedical research. The advent of modern medicine has fundamentally changed the nature of human existence. Over the last 20 years, we have seen innovations that have trans‐ formed the lives of countless individuals. When it first appeared in 1981, HIV/AIDS was a largely fatal disease. Continued development of antiretroviral therapies has dra‐ matically extended the life expectancy for patients in the developed world. Other dis‐ eases, such as hepatitis C, which was considered largely untreatable a decade ago, can now be cured. Advances in genetics are enabling the identification and, hopefully soon, the treatment of a wide array of diseases. Innovations in diagnostics and instru‐ mentation have enabled physicians to specifically identify and target disease in the human body. Many of these breakthroughs have benefited from and will continue to be advanced by computational methods. Why Deep Learning? Machine learning algorithms are now a key component of everything from online shopping to social media. Teams of computer scientists are developing algorithms that enable digital assistants such as the Amazon Echo or Google Home to under‐ stand speech. Advances in machine learning have enabled routine on-the-fly transla‐ tion of web pages between spoken languages. In addition to machine learning’s impact on everyday life, it has impacted many areas of the physical and life sciences. Algorithms are being applied to everything from the detection of new galaxies from telescope images to the classification of subatomic interactions at the Large Hadron Collider. One of the drivers of these technological advances has been the development of a class of machine learning methods known as deep neural networks. While the tech‐ 1

📄 Page 15

nological underpinnings of artificial neural networks were developed in the 1950s and refined in the 1980s, the true power of the technique wasn’t fully realized until advances in computer hardware became available over the last 10 years. We will pro‐ vide a more complete overview of deep neural networks in the next chapter, but it is important to acknowledge some of the advances that have occurred through the application of deep learning: • Many of the developments in speech recognition that have become ubiquitous in cell phones, computers, televisions, and other internet-connected devices have been driven by deep learning. • Image recognition is a key component of self-driving cars, internet search, and other applications. Many of the same developments in deep learning that drove consumer applications are now being used in biomedical research, for example, to classify tumor cells into different types. • Recommender systems have become a key component of the online experience. Companies like Amazon use deep learning to drive their “customers who bought this also bought” approach to encouraging additional purchases. Netflix uses a similar approach to recommend movies that an individual may want to watch. Many of the ideas behind these recommender systems are being used to identify new molecules that may provide starting points for drug discovery efforts. • Language translation was once the domain of very complex rule-based systems. Over the last few years, systems driven by deep learning have outperformed sys‐ tems that had undergone years of manual curation. Many of the same ideas are now being used to extract concepts from the scientific literature and alert scien‐ tists to journal articles that they may have missed. These are just a few of the innovations that have come about through the application of deep learning methods. We are at an interesting time when we have a convergence of widely available scientific data and methods for processing that data. Those with the ability to combine data with new methods for learning from patterns in that data can make significant scientific advances. Contemporary Life Science Is About Data As mentioned previously, the fundamental nature of life science has changed. The availability of robotics and miniaturized experiments has brought about dramatic increases in the amount of experimental data that can be generated. In the 1980s a biologist would perform a single experiment and generate a single result. This sort of data could typically be manipulated by hand with the possible assistance of a pocket calculator. If we fast-forward to today’s biology, we have instrumentation that is capa‐ ble of generating millions of experimental data points in a day or two. Experiments 2 | Chapter 1: Why Life Science?

📄 Page 16

1 Furbush, James. “Machine Learning: A Quick and Simple Definition.” https://www.oreilly.com/ideas/machine- learning-a-quick-and-simple-definition. 2018. like gene sequencing, which can generate huge datasets, have become inexpensive and routine. The advances in gene sequencing have led to the construction of databases that link an individual’s genetic code to a multitude of health-related outcomes, including dia‐ betes, cancer, and genetic diseases such as cystic fibrosis. By using computational techniques to analyze and mine this data, scientists are developing an understanding of the causes of these diseases and using this understanding to develop new treat‐ ments. Disciplines that once relied primarily on human observation are now utilizing data‐ sets that simply could not be analyzed manually. Machine learning is now routinely used to classify images of cells. The output of these machine learning models is used to identify and classify cancerous tumors and to evaluate the effects of potential dis‐ ease treatments. Advances in experimental techniques have led to the development of several data‐ bases that catalog the structures of chemicals and the effects that these chemicals have on a wide range of biological processes or activities. These structure–activity relation‐ ships (SARs) form the basis of a field known as chemical informatics, or cheminfor‐ matics. Scientists mine these large datasets and use the data to build predictive models that will drive the next generation of drug development. With these large amounts of data comes a need for a new breed of scientist who is comfortable in both the scientific and computational domains. Those with these hybrid capabilities have the potential to unlock structure and trends in large datasets and to make the scientific discoveries of tomorrow. What Will You Learn? In the first few chapters of this book, we provide an overview of deep learning and how it can be applied in the life sciences. We begin with machine learning, which has been defined as “the science (and art) of programming computers so that they can learn from data.”1 Chapter 2 provides a brief introduction to deep learning. We begin with an example of how this type of machine learning can be used to perform a simple task like linear regression, and progress to more sophisticated models that are commonly used to solve real-world problems in the life sciences. Machine learning typically proceeds by initially splitting a dataset into a training set that is used to generate a model and a test set that is used to assess the performance of the model. In Chapter 2 we discuss What Will You Learn? | 3

📄 Page 17

some of the details surrounding the training and validation of predictive models. Once a model has been generated, its performance can typically be optimized by varying a number of characteristics known as hyperparameters. The chapter provides an overview of this process. Deep learning is not a single technique, but a set of related methods. Chapter 2 concludes with an introduction to a few of the most important deep learning variants. In Chapter 3, we introduce DeepChem, an open source programming library that has been specifically designed to simplify the creation of deep learning models for a vari‐ ety of life science applications. After providing an overview of DeepChem, we intro‐ duce our first programming example, which demonstrates how the DeepChem library can be used to generate a model for predicting the toxicity of molecules. In a second programming example, we show how DeepChem can be used to classify images, a common task in modern biology. As briefly mentioned earlier, deep learn‐ ing is used in a variety of imaging applications, ranging from cancer diagnosis to the detection of glaucoma. This discussion of specific applications then motivates an explanation of some of the inner workings of deep learning methods. Chapter 4 provides an overview of how machine learning can be applied to mole‐ cules. We begin by introducing molecules, the building blocks of everything around us. Although molecules can be considered analogous to building blocks, they are not rigid. Molecules are flexible and exhibit dynamic behavior. In order to characterize molecules using a computational method like deep learning, we need to find a way to represent molecules in a computer. These encodings can be thought of as similar to the way in which an image can be represented as a set of pixels. In the second half of Chapter 4, we describe a number of ways that molecules can be represented and how these representations can be used to build deep learning models. Chapter 5 provides an introduction to the field of biophysics, which applies the laws of physics to biological phenomena. We start with a discussion of proteins, the molec‐ ular machines that make life possible. A key component of predicting the effects of drugs on the body is understanding their interactions with proteins. In order to understand these effects, we begin with an overview of how proteins are constructed and how protein structures differ. Proteins are entities whose 3D structure dictates their biological function. For a machine learning model to predict the impact of a drug molecule on a protein’s function, we need to represent that 3D structure in a form that can be processed by a machine learning program. In the second half of Chapter 5, we explore a number of ways that protein structures can be represented. With this knowledge in hand, we then review another code example where we use deep learning to predict the degree to which a drug molecule will interact with a pro‐ tein. Genetics has become a key component of contemporary medicine. The genetic sequencing of tumors has enabled the personalized treatment of cancer and has the 4 | Chapter 1: Why Life Science?

📄 Page 18

potential to revolutionize medicine. Gene sequencing, which used to be a complex process requiring huge investments, has now become commonplace and can be rou‐ tinely carried out. We have even reached the point where dog owners can get inex‐ pensive genetic tests to determine their pets’ lineage. In Chapter 6, we provide an overview of genetics and genomics, beginning with an introduction to DNA and RNA, the templates that are used to produce proteins. Recent discoveries have revealed that the interactions of DNA and RNA are much more complex than origi‐ nally believed. In the second half of Chapter 6, we present several code examples that demonstrate how deep learning can be used to predict a number of factors that influ‐ ence the interactions of DNA and RNA. Earlier in this chapter, we alluded to the many advances that have come about through the application of deep learning to the analysis of biological and medical images. Many of the phenomena studied in these experiments are too small to be observed by the human eye. In order to obtain the images used with deep learning methods, we need to utilize a microscope. Chapter 7 provides an overview of micro‐ scopy in its myriad forms, ranging from the simple light microscope we all used in school to sophisticated instruments that are capable of obtaining images at atomic resolution. This chapter also covers some of the limitations of current approaches, and provides information on the experimental pipelines used to obtain the images that drive deep learning models. One area that offers tremendous promise is the application of deep learning to medi‐ cal diagnosis. Medicine is incredibly complex, and no physician can personally embody all of the available medical knowledge. In an ideal situation, a machine learn‐ ing model could digest the medical literature and aid medical professionals in making diagnoses. While we have yet to reach this point, a number of positive steps have been made. Chapter 8 begins with a history of machine learning methods for medical diag‐ nosis and charts the transition from hand-encoded rules to statistical analysis of med‐ ical outcomes. As with many of the topics we’ve discussed, a key component is representing medical information in a format that can be processed by a machine learning program. In this chapter, we provide an introduction to electronic health records and some of the issues surrounding these records. In many cases, medical images can be very complex and the analysis and interpretation of these images can be difficult for even skilled human specialists. In these cases, deep learning can aug‐ ment the skills of a human analyst by classifying images and identifying key features. Chapter 8 concludes with a number of examples of how deep learning is used to ana‐ lyze medical images from a variety of areas. As we mentioned earlier, machine learning is becoming a key component of drug dis‐ covery efforts. Scientists use deep learning models to evaluate the interactions between drug molecules and proteins. These interactions can elicit a biological response that has a therapeutic impact on a patient. The models we’ve discussed so far are discriminative models. Given a set of characteristics of a molecule, the model gen‐ What Will You Learn? | 5

📄 Page 19

erates a prediction of some property. These predictions require an input molecule, which may be derived from a large database of available molecules or may come from the imagination of a scientist. What if, rather than relying on what currently exists, or what we can imagine, we had a computer program that could “invent” new mole‐ cules? Chapter 9 presents a type of deep learning program called a generative model. A generative model is initially trained on a set of existing molecules, then used to generate new molecules. The deep learning program that generates these molecules can also be influenced by other models that predict the activity of the new molecules. Up to now, we have discussed deep learning models as “black boxes.” We present the model with a set of input data and the model generates a prediction, with no explana‐ tion of how or why the prediction was generated. This type of prediction can be less than optimal in many situations. If we have a deep learning model for medical diag‐ nosis, we often need to understand the reasoning behind the diagnosis. An explana‐ tion of the reasons for the diagnosis will provide a physician with more confidence in the prediction and may also influence treatment decisions. One historic drawback to deep learning has been the fact that the models, while often reliable, can be difficult to interpret. A number of techniques are currently being developed to enable users to better understand the factors that led to a prediction. Chapter 10 provides an over‐ view of some of these techniques used to enable human understanding of model pre‐ dictions. Another important aspect of predictive models is the accuracy of a model’s predictions. An understanding of a model’s accuracy can help us determine how much to rely on that model. Given that machine learning can be used to potentially make life-saving diagnoses, an understanding of model accuracy is critical. The final section of Chapter 10 provides an overview of some of the techniques that can be used to assess the accuracy of model predictions. In Chapter 11 we present a real-world case study using DeepChem. In this example, we use a technique called virtual screening to identify potential starting points for the discovery of new drugs. Drug discovery is a complex process that often begins with a technique known as screening. Screening is used to identify molecules that can be optimized to eventually generate drugs. Screening can be carried out experimentally, where millions of molecules are tested in miniaturized biological tests known as assays, or in a computer using virtual screening. In virtual screening, a set of known drugs or other biologically active molecules is used to train a machine learning model. This machine learning model is then used to predict the activity of a large set of molecules. Because of the speed of machine learning methods, hundreds of mil‐ lions of molecules can typically be processed in a few days of computer time. The final chapter of the book examines the current impact and future potential of deep learning in the life sciences. A number of challenges for current efforts, includ‐ ing the availability and quality of datasets, are discussed. We also highlight opportuni‐ ties and potential pitfalls in a number of other areas including diagnostics, personalized medicine, pharmaceutical development, and biology research. 6 | Chapter 1: Why Life Science?

📄 Page 20

CHAPTER 2 Introduction to Deep Learning The goal of this chapter is to introduce the basic principles of deep learning. If you already have lots of experience with deep learning, you should feel free to skim this chapter and then go on to the next. If you have less experience, you should study this chapter carefully as the material it covers will be essential to understanding the rest of the book. In most of the problems we will discuss, our task will be to create a mathematical function: y = f x Notice that x and y are written in bold. This indicates they are vectors. The function might take many numbers as input, perhaps thousands or even millions, and it might produce many numbers as outputs. Here are some examples of functions you might want to create: • x contains the colors of all the pixels in an image. f x should equal 1 if the image contains a cat and 0 if it does not. • The same as above, except f x should be a vector of numbers. The first element indicates whether the image contains a cat, the second whether it contains a dog, the third whether it contains an airplane, and so on for thousands of types of objects. • x contains the DNA sequence for a chromosome. y should be a vector whose length equals the number of bases in the chromosome. Each element should equal 1 if that base is part of a region that codes for a protein, or 0 if not. • x describes the structure of a molecule. (We will discuss various ways of repre‐ senting molecules in later chapters.) y should be a vector where each element 7

The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00

Total Amount (¥)

Donation Count

← Back to List