Deep Reinforcement Learning in Action (Alexander Zai, Brandon Brown) (Z-Library)

M A N N I N G Alexander Zai Brandon Brown IN ACTION

(This page has no text content)

Deep Reinforcement Learning in Action

(This page has no text content)

Deep Reinforcement Learning in Action BRANDON BROWN AND ALEXANDER ZAI M A N N I N G SHELTER ISLAND

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2020 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Development editor: Karen Miller Technical development editor: Marc-Philippe Huget Manning Publications Co. Review editor: Ivan Martinović 20 Baldwin Road Production editor: Deirdre Hiam PO Box 761 Copy editor: Andy Carroll Shelter Island, NY 11964 Proofreader: Jason Everett Technical proofreader: Al Krinker Typesetter: Dennis Dalinnik Cover designer: Marija Tudor ISBN: 9781617295430 Printed in the United States of America

brief contents PART 1 FOUNDATIONS ..............................................................1 1 ■ What is reinforcement learning? 3 2 ■ Modeling reinforcement learning problems: Markov decision processes 23 3 ■ Predicting the best states and actions: Deep Q-networks 54 4 ■ Learning to pick the best policy: Policy gradient methods 90 5 ■ Tackling more complex problems with actor-critic methods 111 PART 2 ABOVE AND BEYOND..................................................139 6 ■ Alternative optimization methods: Evolutionary algorithms 141 7 ■ Distributional DQN: Getting the full story 167 8 ■ Curiosity-driven exploration 210 9 ■ Multi-agent reinforcement learning 243 10 ■ Interpretable reinforcement learning: Attention and relational models 283 11 ■ In conclusion: A review and roadmap 329v

(This page has no text content)

contents preface xiii acknowledgments xv about this book xvi about the authors xix about the cover illustration xx PART 1 FOUNDATIONS ....................................................1 1 What is reinforcement learning? 3 1.1 The “deep” in deep reinforcement learning 4 1.2 Reinforcement learning 6 1.3 Dynamic programming versus Monte Carlo 9 1.4 The reinforcement learning framework 10 1.5 What can I do with reinforcement learning? 14 1.6 Why deep reinforcement learning? 16 1.7 Our didactic tool: String diagrams 18 1.8 What’s next? 20vii

CONTENTSviii2 Modeling reinforcement learning problems: Markov decision processes 23 2.1 String diagrams and our teaching methods 23 2.2 Solving the multi-arm bandit 28 Exploration and exploitation 29 ■ Epsilon-greedy strategy 30 Softmax selection policy 35 2.3 Applying bandits to optimize ad placements 37 Contextual bandits 38 ■ States, actions, rewards 39 2.4 Building networks with PyTorch 40 Automatic differentiation 40 ■ Building Models 41 2.5 Solving contextual bandits 42 2.6 The Markov property 47 2.7 Predicting future rewards: Value and policy functions 49 Policy functions 50 ■ Optimal policy 51 ■ Value functions 51 3 Predicting the best states and actions: Deep Q-networks 54 3.1 The Q function 55 3.2 Navigating with Q-learning 56 What is Q-learning? 56 ■ Tackling Gridworld 57 Hyperparameters 59 ■ Discount factor 60 ■ Building the network 61 ■ Introducing the Gridworld game engine 63 A neural network as the Q function 65 3.3 Preventing catastrophic forgetting: Experience replay 75 Catastrophic forgetting 75 ■ Experience replay 76 3.4 Improving stability with a target network 80 Learning instability 81 3.5 Review 86 4 Learning to pick the best policy: Policy gradient methods 90 4.1 Policy function using neural networks 91 Neural network as the policy function 91 ■ Stochastic policy gradient 92 ■ Exploration 94 4.2 Reinforcing good actions: The policy gradient algorithm 95 Defining an objective 95 ■ Action reinforcement 97 Log probability 98 ■ Credit assignment 99

CONTENTS ix4.3 Working with OpenAI Gym 100 CartPole 102 ■ The OpenAI Gym API 103 4.4 The REINFORCE algorithm 103 Creating the policy network 104 ■ Having the agent interact with the environment 104 ■ Training the model 105 ■ The full training loop 107 ■ Chapter conclusion 108 5 Tackling more complex problems with actor-critic methods 111 5.1 Combining the value and policy function 113 5.2 Distributed training 118 5.3 Advantage actor-critic 123 5.4 N-step actor-critic 132 PART 2 ABOVE AND BEYOND ........................................139 6 Alternative optimization methods: Evolutionary algorithms 141 6.1 A different approach to reinforcement learning 142 6.2 Reinforcement learning with evolution strategies 143 Evolution in theory 143 ■ Evolution in practice 147 6.3 A genetic algorithm for CartPole 151 6.4 Pros and cons of evolutionary algorithms 158 Evolutionary algorithms explore more 158 ■ Evolutionary algorithms are incredibly sample intensive 158 Simulators 159 6.5 Evolutionary algorithms as a scalable alternative 159 Scaling evolutionary algorithms 160 ■ Parallel vs. serial processing 161 ■ Scaling efficiency 162 ■ Communicating between nodes 163 ■ Scaling linearly 165 ■ Scaling gradient- based approaches 165 7 Distributional DQN: Getting the full story 167 7.1 What’s wrong with Q-learning? 168 7.2 Probability and statistics revisited 173 Priors and posteriors 175 ■ Expectation and variance 176 7.3 The Bellman equation 180 The distributional Bellman equation 180

CONTENTSx7.4 Distributional Q-learning 181 Representing a probability distribution in Python 182 Implementing the Dist-DQN 191 7.5 Comparing probability distributions 193 7.6 Dist-DQN on simulated data 198 7.7 Using distributional Q-learning to play Freeway 203 8 Curiosity-driven exploration 210 8.1 Tackling sparse rewards with predictive coding 212 8.2 Inverse dynamics prediction 215 8.3 Setting up Super Mario Bros. 218 8.4 Preprocessing and the Q-network 221 8.5 Setting up the Q-network and policy function 223 8.6 Intrinsic curiosity module 226 8.7 Alternative intrinsic reward mechanisms 239 9 Multi-agent reinforcement learning 243 9.1 From one to many agents 244 9.2 Neighborhood Q-learning 248 9.3 The 1D Ising model 252 9.4 Mean field Q-learning and the 2D Ising model 261 9.5 Mixed cooperative-competitive games 271 10 Interpretable reinforcement learning: Attention and relational models 283 10.1 Machine learning interpretability with attention and relational biases 284 Invariance and equivariance 286 10.2 Relational reasoning with attention 287 Attention models 288 ■ Relational reasoning 290 Self-attention models 295 10.3 Implementing self-attention for MNIST 298 Transformed MNIST 298 ■ The relational module 299 Tensor contractions and Einstein notation 303 ■ Training the relational module 306 10.4 Multi-head attention and relational DQN 310

CONTENTS xi10.5 Double Q-learning 317 10.6 Training and attention visualization 319 Maximum entropy learning 323 ■ Curriculum learning 323 Visualizing attention weights 323 11 In conclusion: A review and roadmap 329 11.1 What did we learn? 329 11.2 The uncharted topics in deep reinforcement learning 331 Prioritized experience replay 331 ■ Proximal policy optimization (PPO) 332 ■ Hierarchical reinforcement learning and the options framework 333 ■ Model-based planning 333 ■ Monte Carlo tree search (MCTS) 334 11.3 The end 335 appendix Mathematics, deep learning, PyTorch 336 Reference list 348 index 351

(This page has no text content)

preface Deep reinforcement learning was launched into the spotlight in 2015, when Deep- Mind produced an algorithm capable of playing a suite of Atari 2600 games at super- human performance. Artificial intelligence seemed to be finally making some real progress, and we wanted to be a part of it. Both of us have software engineering backgrounds and an interest in neurosci- ence, and we’ve been interested in the broader field of artificial intelligence for a long time (in fact, one of us actually wrote his first neural network before high school in C#). These early experiences did not lead to any sustained interest, since this was before the deep learning revolution circa 2012, when the superlative performance of deep learning was clear. But after seeing the amazing successes of deep learning around this time, we became recommitted to being a part of the exciting and bur- geoning fields of deep learning and then deep reinforcement learning, and both of us have incorporated machine learning more broadly into our careers in one way or another. Alex transitioned into a career as a machine learning engineer, making his mark at little-known places like Amazon, and Brandon began using machine learning in academic neuroscience research. As we delved into deep reinforcement learning, we had to struggle through dozens of textbooks and primary research articles, parsing advanced mathematics and machine learning theory. Yet we found that the funda- mentals of deep reinforcement learning are actually quite approachable from a soft- ware engineering background. All of the math can be easily translated into a language that any programmer would find quite readable.xiii

PREFACExiv We began blogging about the things we were learning in the machine learning world and projects that we were using in our work. We ended up getting a fair amount of positive feedback, which led us to the idea of collaborating on this book. We believe that most of the resources out there for learning hard things are either too simple and leave out the most compelling aspects of the topic or are inaccessible to people with- out sophisticated mathematics backgrounds. This book is our effort at translating a body of work written for experts into a course for those with nothing more than a pro- gramming background and some basic knowledge of neural networks. We employ some novel teaching methods that we think set our book apart and lead to much faster understanding. We start from the basics, and by the end you will be implement- ing cutting-edge algorithms invented by industry-based research groups like Deep- Mind and OpenAI, as well as from high-powered academic labs like the Berkeley Artificial Intelligence Research (BAIR) Lab and University College London.

acknowledgments This book took way longer than we anticipated, and we owe a lot to our editors Can- dace West and Susanna Kline for helping us at every stage of the process and keeping us on track. There are a lot of details to keep track of when writing a book, and with- out the professional and supportive editorial staff we would have floundered. We’d also like to thank our technical editors Marc-Philippe Huget and Al Krinker and all of the reviewers who took the time to read our manuscript and provide us with crucial feedback. In particular, we thank the reviewers: Al Rahimi, Ariel Gamiño, Claudio Bernardo Rodriguez, David Krief, Dr. Brett Pennington, Ezra Joel Schroeder, George L. Gaines, Godfred Asamoah, Helmut Hauschild, Ike Okonkwo, Jonathan Wood, Kalyan Reddy, M. Edward (Ed) Borasky, Michael Haller, Nadia Noori, Satyajit Sarangi, and Tobias Kaatz. We would also like to thank everyone at Manning who worked on this project: Karen Miller, the developmental editor; Ivan Martinović, the review editor; Deirdre Hiam, the project editor; Andy Carroll, the copy editor; and Jason Everett, the proofreader. In this age, many books are self-published using various online services, and we were initially tempted by this option; however, after having been through this whole process, we can see the tremendous value in professional editing staff. In particular, we thank copy editor Andy Carroll for his insightful feedback that dramatically improved the clarity of the text. Alex thanks his PI Jamie who introduced him to machine learning early in his undergraduate career. Brandon thanks his wife Xinzhu for putting up with his late nights of writing and time away from the family and for giving him two wonderful children, Isla and Avin.xv

about this book Who should read this book Deep Reinforcement Learning in Action is a course designed to take you from the very foundational concepts in reinforcement learning all the way to implementing the lat- est algorithms. As a course, each chapter centers around one major project meant to illustrate the topic or concept of that chapter. We’ve designed each project to be something that can be efficiently run on a modern laptop; we don’t expect you to have access to expensive GPUs or cloud computing resources (though access to these resources does make things run faster). This book is for individuals with a programming background, in particular, a work- ing knowledge of Python, and for people who have at least a basic understanding of neural networks (a.k.a. deep learning). By “basic understanding,” we mean that you have at least tried implementing a simple neural network in Python even if you didn’t fully understand what was going on under the hood. Although this book is focused on using neural networks for the purposes of reinforcement learning, you will also proba- bly learn a lot of new things about deep learning in general that can be applied to other problems outside of reinforcement learning, so you do not need to be an expert at deep learning before jumping into deep reinforcement learning. xvi

ABOUT THIS BOOK xviiHow this book is organized: A roadmap The book has two sections with 11 chapters. Part 1 explains the fundamentals of deep reinforcement learning. ■ Chapter 1 gives a high-level introduction to deep learning, reinforcement learning, and the marriage of the two into deep reinforcement learning. ■ Chapter 2 introduces the fundamental concepts of reinforcement learning that will reappear through the rest of the book. We also implement our first practi- cal reinforcement learning algorithm. ■ Chapter 3 introduces deep Q-learning, one of the two broad classes of deep reinforcement algorithms. This is the algorithm that DeepMind used to outper- form humans at many Atari 2600 games in 2015. ■ Chapter 4 describes the other major class of deep reinforcement learning algo- rithms, policy-gradient methods. We use this to train an algorithm to play a sim- ple game. ■ Chapter 5 shows how we can combine deep Q-learning from chapter 3 and pol- icy-gradient methods from chapter 4 into a combined class of algorithms called actor-critic algorithms. Part 2 builds on the foundations we built in part 1 to cover the biggest advances in deep reinforcement learning in recent years. ■ Chapter 6 shows how to implement evolutionary algorithms, which use princi- ples of biological evolution, to train neural networks. ■ Chapter 7 describes a method to significantly improve the performance of deep Q-learning by incorporating probabilistic concepts. ■ Chapter 8 introduces a way to give reinforcement learning algorithms a sense of curiosity to explore their environments without any external cues. ■ Chapter 9 shows how to extend what we have learned in training single agent reinforcement learning algorithms into systems that have multiple interacting agents. ■ Chapter 10 describes how to make deep reinforcement learning algorithms more interpretable and efficient by using attention mechanisms. ■ Chapter 11 concludes the book by discussing all the exciting areas in deep rein- forcement learning we didn’t have the space to cover but that you may be inter- ested in. The chapters in part 1 should be read in order, as each chapter builds on the concepts in the previous chapter. The chapters in part 2 can more or less be approached in any order, although we still recommend reading them in order.

ABOUT THIS BOOKxviiiAbout the code As we noted, this book is a course, so we have included all of the code necessary to run the projects within the main text of the book. In general, we include shorter code blocks as inline code which is formatted in this font as well as code in separate num- bered code listings that represented larger code blocks. At press time we are confident all the in-text code is working, but we cannot guar- antee that the code will continue to be bug free (especially for those of you reading this in print) in the long term, as the deep learning field and consequently its libraries are evolving quickly. The in-text code has also been pared down to the minimum nec- essary to get the projects working, so we highly recommend you follow the projects in the book using the code in this book’s GitHub repository: http://mng.bz/JzKp. We intend to keep the code on GitHub up to date for the foreseeable future, and it also includes additional comments and code that we used to generate many of the figures in the book. Hence, it is best if you read the book alongside the corresponding code in the Jupyter Notebooks found on the GitHub repository. We are confident that this book will teach you the concepts of deep reinforcement learning and not just how to narrowly code things in Python. If Python were to some- how disappear after you finish this book, you would still be able to implement all of these algorithms in some other language or framework, since you will understand the fundamentals. liveBook discussion forum Purchase of Deep Reinforcement Learning in Action includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the authors and from other users. To access the forum, go to https://livebook.manning.com/#!/book/deep-reinforce- ment-learning-in-action/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/#!/discussion. Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the authors can take place. It is not a commitment to any specific amount of participation on the part of the authors, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the authors some challenging questions lest their interest stray! The forum and the archives of previous discussions will be accessible from the pub- lisher’s website as long as the book is in print.

Statistics

Uploader

Deep Reinforcement Learning in Action (Alexander Zai, Brandon Brown) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Recommended for You

Statistics

Uploader

Deep Reinforcement Learning in Action (Alexander Zai, Brandon Brown) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Reply to Comment

Edit Comment

Recommended for You