📄 Page
1
M A N N I N G Max Pumperla Kevin Ferguson Foreword by Thore Graepel
📄 Page
2
Save 35% at manning.com! Use the code humble535 at checkout to save on your first purchase at manning.com. Manning publishes high-quality books and courses for technology professionals. Here are a few Manning advantages: Early access Don’t wait to start learning! In MEAP, the Manning Early Access Program, you read books while they’re being written. Access anywhere with liveBook The Manning liveBook platform provides instant browser- based access to our content. Beyond books Cutting edge liveProjects, liveAudio, and liveVideo courses give you new ways to learn. Only available at manning.com Impeccable quality We believe in excellence. Our customers tell us we produce the highest quality content you can buy. Exclusive eBooks Manning eBooks are only available from manning.com. You won’t find them anywhere else. Shop now at manning.com Register to get exclusive offers and updates!
📄 Page
3
Deep Learning and the Game of Go
📄 Page
4
(This page has no text content)
📄 Page
5
Deep Learning and the Game of Go MAX PUMPERLA AND KEVIN FERGUSON M A N N I N G Shelter Island
📄 Page
6
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2019 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. Development editor: Jenny Stout 20 Baldwin Road Technical development editor: Charles Feduke PO Box 761 Review editor: Ivan Martinović Shelter Island, NY 11964 Project editor: Lori Weidert Copyeditor: Sharon Wilkey Proofreader: Michelle Melani Technical proofreader: Tanya Wilke Typesetter: Gordan Salinovic Cover designer: Marija Tudor ISBN 9781617295324 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – SP – 23 22 21 20 19 18
📄 Page
7
To Anne, it’s all for you. —Max To Ian —Kevin
📄 Page
8
(This page has no text content)
📄 Page
9
vii brief contents PART 1 FOUNDATIONS..................................................................1 1 ■ Toward deep learning: a machine-learning introduction 3 2 ■ Go as a machine-learning problem 18 3 ■ Implementing your first Go bot 29 PART 2 MACHINE LEARNING AND GAME AI.....................................55 4 ■ Playing games with tree search 57 5 ■ Getting started with neural networks 85 6 ■ Designing a neural network for Go data 116 7 ■ Learning from data: a deep-learning bot 148 8 ■ Deploying bots in the wild 178 9 ■ Learning by practice: reinforcement learning 200 10 ■ Reinforcement learning with policy gradients 218 11 ■ Reinforcement learning with value methods 233 12 ■ Reinforcement learning with actor-critic methods 247
📄 Page
10
BRIEF CONTENTSviii PART 3 GREATER THAN THE SUM OF ITS PARTS..............................263 13 ■ AlphaGo: Bringing it all together 265 14 ■ AlphaGo Zero: Integrating tree search with reinforcement learning 289
📄 Page
11
ix contents foreword xvii preface xix acknowledgments xxi about this book xxii about the authors xxvii about the cover illustration xxviii PART 1 FOUNDATIONS........................................................1 1 Toward deep learning: a machine-learning introduction 3 1.1 What is machine learning? 4 How does machine learning relate to AI? 6 ■ What you can and can’t do with machine learning 7 1.2 Machine learning by example 7 Using machine learning in software applications 10 ■ Supervised learning 12 ■ Unsupervised learning 13 ■ Reinforcement learning 13 1.3 Deep learning 14 1.4 What you’ll learn in this book 16 1.5 Summary 16
📄 Page
12
CONTENTSx 2 Go as a machine-learning problem 18 2.1 Why games? 18 2.2 A lightning introduction to the game of Go 19 Understanding the board 19 ■ Placing and capturing stones 20 Ending the game and counting 21 ■ Understanding ko 22 2.3 Handicaps 23 2.4 Where to learn more 23 2.5 What can we teach a machine? 24 Selecting moves in the opening 24 ■ Searching game states 24 Reducing the number of moves to consider 25 ■ Evaluating game states 25 2.6 How to measure your Go AI’s strength 26 Traditional Go ranks 26 ■ Benchmarking your Go AI 27 2.7 Summary 27 3 Implementing your first Go bot 29 3.1 Representing a game of Go in Python 30 Implementing the Go board 33 ■ Tracking connected groups of stones in Go: strings 33 ■ Placing and capturing stones on a Go board 34 3.2 Capturing game state and checking for illegal moves 37 Self-capture 38 ■ Ko 39 3.3 Ending a game 40 3.4 Creating your first bot: the weakest Go AI imaginable 42 3.5 Speeding up game play with Zobrist hashing 46 3.6 Playing against your bot 51 3.7 Summary 53 PART 2 MACHINE LEARNING AND GAME AI ...........................55 4 Playing games with tree search 57 4.1 Classifying games 58 4.2 Anticipating your opponent with minimax search 59 4.3 Solving tic-tac-toe: a minimax example 62 4.4 Reducing search space with pruning 65 Reducing search depth with position evaluation 67 ■ Reducing search width with alpha-beta pruning 70
📄 Page
13
CONTENTS xi 4.5 Evaluating game states with Monte Carlo tree search 73 Implementing Monte Carlo tree search in Python 76 ■ How to select which branch to explore 79 ■ Applying Monte Carlo tree search to Go 82 4.6 Summary 83 5 Getting started with neural networks 85 5.1 A simple use case: classifying handwritten digits 86 The MNIST data set of handwritten digits 87 ■ MNIST data preprocessing 87 5.2 The basics of neural networks 94 Logistic regression as simple artificial neural network 94 Networks with more than one output dimension 94 5.3 Feed-forward networks 95 5.4 How good are our predictions? Loss functions and optimization 98 What is a loss function? 98 ■ Mean squared error 99 ■ Finding minima in loss functions 100 ■ Gradient descent to find minima 101 ■ Stochastic gradient descent for loss functions 102 Propagating gradients back through your network 104 5.5 Training a neural network step-by-step in Python 106 Neural network layers in Python 107 ■ Activation layers in neural networks 108 ■ Dense layers in Python as building blocks for feed-forward networks 109 ■ Sequential neural networks with Python 111 Applying your network handwritten digit classification 113 5.6 Summary 115 6 Designing a neural network for Go data 116 6.1 Encoding a Go game position for neural networks 118 6.2 Generating tree-search games as network training data 121 6.3 Using the Keras deep-learning library 124 Understanding Keras design principles 124 ■ Installing the Keras deep-learning library 125 ■ Running a familiar first example with Keras 125 ■ Go move prediction with feed-forward neural networks in Keras 127 6.4 Analyzing space with convolutional networks 131 What convolutions do intuitively 131 ■ Building convolutional neural networks with Keras 135 ■ Reducing space with pooling layers 136
📄 Page
14
CONTENTSxii 6.5 Predicting Go move probabilities 138 Using the softmax activation function in the last layer 138 Cross-entropy loss for classification problems 139 6.6 Building deeper networks with dropout and rectified linear units 141 Dropping neurons for regularization 141 ■ The rectified linear unit activation function 142 6.7 Putting it all together for a stronger Go move-prediction network 143 6.8 Summary 147 7 Learning from data: a deep-learning bot 148 7.1 Importing Go game records 150 The SGF file format 150 ■ Downloading and replaying Go game records from KGS 151 7.2 Preparing Go data for deep learning 152 Replaying a Go game from an SGF record 152 ■ Building a Go data processor 154 ■ Building a Go data generator to load data efficiently 161 ■ Parallel Go data processing and generators 163 7.3 Training a deep-learning model on human game-play data 164 7.4 Building more-realistic Go data encoders 168 7.5 Training efficiently with adaptive gradients 171 Decay and momentum in SGD 171 ■ Optimizing neural networks with Adagrad 172 ■ Refining adaptive gradients with Adadelta 173 7.6 Running your own experiments and evaluating performance 174 A guideline to testing architectures and hyperparameters 174 Evaluating performance metrics for training and test data 176 7.7 Summary 177 8 Deploying bots in the wild 178 8.1 Creating a move-prediction agent from a deep neural network 179 8.2 Serving your Go bot to a web frontend 182 An end-to-end Go bot example 184
📄 Page
15
CONTENTS xiii 8.3 Training and deploying a Go bot in the cloud 186 8.4 Talking to other bots: the Go Text Protocol 186 8.5 Competing against other bots locally 189 When a bot should pass or resign 189 ■ Let your bot play against other Go programs 190 8.6 Deploying a Go bot to an online Go server 195 Registering a bot at the Online Go Server 199 8.7 Summary 199 9 Learning by practice: reinforcement learning 200 9.1 The reinforcement-learning cycle 201 9.2 What goes into experience? 203 9.3 Building an agent that can learn 206 Sampling from a probability distribution 207 ■ Clipping a probability distribution 208 ■ Initializing an agent 209 Loading and saving your agent from disk 209 ■ Implementing move selection 211 9.4 Self-play: how a computer program practices 212 Representing experience data 212 ■ Simulating games 215 9.5 Summary 217 10 Reinforcement learning with policy gradients 218 10.1 How random games can identify good decisions 219 10.2 Modifying neural network policies with gradient descent 223 10.3 Tips for training with self-play 226 Evaluating your progress 227 ■ Measuring small differences in strength 227 ■ Tuning a stochastic gradient descent (SGD) optimizer 228 10.4 Summary 232 11 Reinforcement learning with value methods 233 11.1 Playing games with Q-learning 234 11.2 Q-learning with Keras 238 Building two-input networks in Keras 238 ■ Implementing the e-greedy policy with Keras 242 ■ Training an action-value function 245 11.3 Summary 246
📄 Page
16
CONTENTSxiv 12 Reinforcement learning with actor-critic methods 247 12.1 Advantage tells you which decisions are important 248 What is advantage? 248 ■ Calculating advantage during self-play 251 12.2 Designing a neural network for actor-critic learning 252 12.3 Playing games with an actor-critic agent 254 12.4 Training an actor-critic agent from experience data 255 12.5 Summary 261 PART 3 GREATER THAN THE SUM OF ITS PARTS....................263 13 AlphaGo: Bringing it all together 265 13.1 Training deep neural networks for AlphaGo 268 Network architectures in AlphaGo 268 ■ The AlphaGo board encoder 270 ■ Training AlphaGo-style policy networks 273 13.2 Bootstrapping self-play from policy networks 275 13.3 Deriving a value network from self-play data 276 13.4 Better search with policy and value networks 277 Using neural networks to improve Monte Carlo rollouts 277 Tree search with a combined value function 279 ■ Implementing AlphaGo’s search algorithm 281 13.5 Practical considerations for training your own AlphaGo 287 13.6 Summary 288 14 AlphaGo Zero: Integrating tree search with reinforcement learning 289 14.1 Building a neural network for tree search 290 14.2 Guiding tree search with a neural network 292 Walking down the tree 295 ■ Expanding the tree 298 Selecting a move 300 14.3 Training 301 14.4 Improving exploration with Dirichlet noise 305 14.5 Modern techniques for deeper neural networks 307 Batch normalization 307 ■ Residual networks 308
📄 Page
17
CONTENTS xv 14.6 Exploring additional resources 308 14.7 Wrapping up 309 14.8 Summary 309 appendix A Mathematical foundations 311 appendix B The backpropagation algorithm 318 appendix C Go programs and servers 323 appendix D Training and deploying bots by using Amazon Web Services 326 appendix E Submitting a bot to the Online Go Server 335 index 341
📄 Page
18
(This page has no text content)
📄 Page
19
xvii foreword For us, the members of the AlphaGo team, the AlphaGo story was the adventure of a lifetime. It began, as many great adventures do, with a small step—training a simple convolutional neural network on records of Go games played by strong human play- ers. This led to pivotal breakthroughs in the recent development of machine learning, as well as a series of unforgettable events, including matches against the formidable Go professionals Fan Hui, Lee Sedol, and Ke Jie. We’re proud to see the lasting impact of these matches on the way Go is played around the world, as well as their role in making more people aware of, and interested in, the field of artificial intelligence. But why, you might ask, should we care about games? Just as children use games to learn about aspects of the real world, so researchers in machine learning use them to train artificial software agents. In this vein, the AlphaGo project is part of DeepMind’s strategy to use games as simulated microcosms of the real world. This helps us study artificial intelligence and train learning agents with the goal of one day building gen- eral purpose learning systems capable of solving the world’s most complex problems. AlphaGo works in a way that is similar to the two modes of thinking that Nobel laureate Daniel Kahnemann describes in his book on human cognition, Thinking Fast and Slow. In the case of AlphaGo, the slow mode of thinking is carried out by a planning algorithm called Monte Carlo Tree Search, which plans from a given position by expanding the game tree that represents possible future moves and counter moves. But with roughly 10^170 (1 followed by 170 0s) many possible Go positions, searching through every sequence of a game proves impossible. To get around this and to reduce the size of the search space, we paired the Monte Carlo Tree Search with a deep learning
📄 Page
20
FOREWORDxviii component—two neural networks trained to estimate how likely each side is to win, and what the most promising moves are. A later version, AlphaZero, uses principles of reinforcement learning to play entirely against itself, eliminating the need for any human training data. It learned from scratch the game of Go (as well as chess and shogi), often discovering (and later dis- carding) many strategies developed by human players over hundreds of years and cre- ating many of its own unique strategies along the way. Over the course of this book, Max Pumperla and Kevin Ferguson take you on this fascinating journey from AlphaGo through to its later extensions. By the end, you will not only understand how to implement an AlphaGo-style Go engine, but you will also have great practical understanding of some of the most important building blocks of modern AI algorithms: Monte Carlo Tree Search, deep learning, and reinforcement learning. The authors have carefully tied these topics together, using the game of Go as an exciting and accessible running example. As an aside, you will have learned the basics of one of the most beautiful and challenging games ever invented. Furthermore, the book empowers you from the beginning to build a working Go bot, which develops over the course of the book, from making entirely random moves to becoming a sophisticated self-learning Go AI. The authors take you by the hand, providing both excellent explanations of the underlying concepts, as well as execut- able Python code. They do not hesitate to dive into the necessary details of topics like data formats, deployment, and cloud computing necessary for you to actually get your Go bot to work and play. In summary, Deep Learning and the Game of Go is a highly readable and engaging introduction to modern artificial intelligence and machine learning. It succeeds in taking what has been described as one of the most exciting milestones in artificial intelligence and transforming it into an enjoyable first course in the subject. Any reader who follows this path will be equipped to understand and build modern AI sys- tems, with possible applications in all those situations that require a combination of “fast” pattern matching and “slow” planning. That is, the thinking fast and slow required for basic cognition. —THORE GRAEPEL, RESEARCH SCIENTIST, DEEPMIND, ON BEHALF OF THE ALPHAGO TEAM AT DEEPMIND