📄 Page
1
Yves Hilpisch Artificial Intelligence in Finance A Python-Based Guide
📄 Page
2
(This page has no text content)
📄 Page
3
Yves Hilpisch Artificial Intelligence in Finance A Python-Based Guide Boston Farnham Sebastopol TokyoBeijing
📄 Page
4
978-1-492-05543-3 [LSI] Artificial Intelligence in Finance by Yves Hilpisch Copyright © 2021 Yves Hilpisch. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Michelle Smith Development Editor: Corbin Collins Production Editor: Daniel Elfanbaum Copyeditor: Piper Editorial, LLC Proofreader: JM Olejarz Indexer: Potomac Indexing, LLC Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: O’Reilly Media, Inc. October 2020: First Edition Revision History for the First Edition 2020-10-14: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781492055433 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Artificial Intelligence in Finance, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. This book is not intended as financial advice. Please consult a qualified professional if you require financial advice.
📄 Page
5
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Part I. Machine Intelligence 1. Artificial Intelligence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Algorithms 3 Types of Data 4 Types of Learning 4 Types of Tasks 8 Types of Approaches 8 Neural Networks 9 OLS Regression 9 Estimation with Neural Networks 13 Classification with Neural Networks 20 Importance of Data 22 Small Data Set 23 Larger Data Set 26 Big Data 28 Conclusions 29 References 30 2. Superintelligence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Success Stories 32 Atari 32 Go 38 Chess 40 Importance of Hardware 42 iii
📄 Page
6
Forms of Intelligence 44 Paths to Superintelligence 45 Networks and Organizations 46 Biological Enhancements 46 Brain-Machine Hybrids 47 Whole Brain Emulation 48 Artificial Intelligence 49 Intelligence Explosion 50 Goals and Control 50 Superintelligence and Goals 51 Superintelligence and Control 53 Potential Outcomes 54 Conclusions 56 References 56 Part II. Finance and Machine Learning 3. Normative Finance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Uncertainty and Risk 62 Definitions 62 Numerical Example 63 Expected Utility Theory 66 Assumptions and Results 66 Numerical Example 69 Mean-Variance Portfolio Theory 72 Assumptions and Results 72 Numerical Example 75 Capital Asset Pricing Model 82 Assumptions and Results 83 Numerical Example 85 Arbitrage Pricing Theory 90 Assumptions and Results 91 Numerical Example 93 Conclusions 95 References 96 4. Data-Driven Finance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Scientific Method 100 Financial Econometrics and Regression 101 Data Availability 104 Programmatic APIs 105 iv | Table of Contents
📄 Page
7
Structured Historical Data 105 Structured Streaming Data 108 Unstructured Historical Data 110 Unstructured Streaming Data 112 Alternative Data 113 Normative Theories Revisited 117 Expected Utility and Reality 118 Mean-Variance Portfolio Theory 123 Capital Asset Pricing Model 130 Arbitrage Pricing Theory 134 Debunking Central Assumptions 143 Normally Distributed Returns 143 Linear Relationships 153 Conclusions 155 References 156 Python Code 156 5. Machine Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Learning 162 Data 162 Success 165 Capacity 169 Evaluation 172 Bias and Variance 178 Cross-Validation 180 Conclusions 183 References 183 6. AI-First Finance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Efficient Markets 186 Market Prediction Based on Returns Data 192 Market Prediction with More Features 199 Market Prediction Intraday 204 Conclusions 205 References 207 Part III. Statistical Inefficiencies 7. Dense Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 The Data 212 Baseline Prediction 214 Table of Contents | v
📄 Page
8
Normalization 218 Dropout 220 Regularization 222 Bagging 225 Optimizers 227 Conclusions 228 References 228 8. Recurrent Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 First Example 230 Second Example 234 Financial Price Series 237 Financial Return Series 240 Financial Features 242 Estimation 243 Classification 244 Deep RNNs 245 Conclusions 246 References 247 9. Reinforcement Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Fundamental Notions 250 OpenAI Gym 251 Monte Carlo Agent 255 Neural Network Agent 257 DQL Agent 260 Simple Finance Gym 264 Better Finance Gym 268 FQL Agent 271 Conclusions 277 References 278 Part IV. Algorithmic Trading 10. Vectorized Backtesting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Backtesting an SMA-Based Strategy 282 Backtesting a Daily DNN-Based Strategy 289 Backtesting an Intraday DNN-Based Strategy 295 Conclusions 301 References 301 vi | Table of Contents
📄 Page
9
11. Risk Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Trading Bot 304 Vectorized Backtesting 308 Event-Based Backtesting 311 Assessing Risk 318 Backtesting Risk Measures 322 Stop Loss 324 Trailing Stop Loss 326 Take Profit 328 Conclusions 332 References 332 Python Code 333 Finance Environment 333 Trading Bot 335 Backtesting Base Class 339 Backtesting Class 342 12. Execution and Deployment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Oanda Account 346 Data Retrieval 347 Order Execution 351 Trading Bot 357 Deployment 364 Conclusions 368 References 369 Python Code 369 Oanda Environment 369 Vectorized Backtesting 372 Oanda Trading Bot 373 Part V. Outlook 13. AI-Based Competition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 AI and Finance 380 Lack of Standardization 382 Education and Training 383 Fight for Resources 385 Market Impact 386 Competitive Scenarios 387 Risks, Regulation, and Oversight 388 Table of Contents | vii
📄 Page
10
Conclusions 391 References 392 14. Financial Singularity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Notions and Definitions 396 What Is at Stake? 396 Paths to Financial Singularity 400 Orthogonal Skills and Resources 401 Scenarios Before and After 402 Star Trek or Star Wars 403 Conclusions 404 References 404 Part VI. Appendixes A. Interactive Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 B. Neural Network Classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 C. Convolutional Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 viii | Table of Contents
📄 Page
11
Preface Will alpha eventually go to zero for every imaginable investment strategy? More funda‐ mentally, is the day approaching when, thanks to so many smart people and smarter computers, financial markets really do become perfect, and we can just sit back, relax, and assume that all assets are priced correctly? —Robert Shiller (2015) Artificial intelligence (AI) rose to become a key technology in the 2010s and is assumed to be the dominating technology in the 2020s. Spurred by technological innovations, algorithmic breakthroughs, availability of big data, and ever-increasing compute power, many industries are undergoing fundamental changes driven by AI. While media and public attention mostly focus on breakthroughs in areas such as gaming and self-driving cars, AI has also become a major technological force in the financial industry. However, it is safe to say that AI in finance is still at a nascent stage —as compared, for example, to industries such as web search or social media. This book sets out to cover a number of important aspects related to AI in finance. AI in finance is already a vast topic, and a single book needs to focus on selected aspects. Therefore, this book covers the basics first (see Part I and Part II). It then zooms in on discovering statistical inefficiencies in financial markets by the use of AI and, more specifically, neural networks (see Part III). Such inefficiencies—embodied by AI algo‐ rithms that successfully predict future market movements—are a prerequisite for the exploitation of economic inefficiencies through algorithmic trading (see Part IV). Being able to systematically exploit statistical and economic inefficiencies would prove contradictory to one of the established theories and cornerstones in finance: the efficient market hypothesis (EMH). The design of a successful trading bot can be considered the holy grail in finance to which AI might lead the way. This book con‐ cludes by discussing consequences of AI for the financial industry and the possibility of a financial singularity (see Part V). There is also a technical appendix that shows how to build neural networks from scratch based on plain Python code and provides additional examples for their application (see Part VI). ix
📄 Page
12
1 See the Arcade Learning Environment. The problem of applying AI to finance is not too dissimilar to the problem of apply‐ ing AI to other fields. Some major breakthroughs in AI in the 2010s were made possi‐ ble by the application of reinforcement learning (RL) to playing arcade games, such as those from Atari published in the 1980s (see Mnih et al. 2013), and to board games, such as chess or Go (see Silver et al. 2016). Lessons learned from applying RL in gaming contexts, among other areas, are today applied to such challenging prob‐ lems as designing and building autonomous vehicles or improving medical diagnos‐ tics. Table P-1 compares the application of AI and RL in different domains. Table P-1. Comparison of AI in different domains Domain Agent Goal Approach Reward Obstacle Risks Arcade games AI agent (software) Maximizing game score RL in virtual gaming environment Points and scores Planning and delayed rewards None Autonomous driving Self-driving car (software + car) Safely driving from location A to B RL in virtual (gaming) environment, real- world test drives Punishment for mistakes Transition from virtual to physical world Damaging property, harming people Financial trading Trading bot (software) Maximizing long-term performance RL in virtual trading environment Financial returns Efficient markets and competition Financial losses The beauty of training AI agents to play arcade games lies in the availability of a per‐ fect virtual learning environment1 and the absence of any kind of risk. With autono‐ mous vehicles, the major problem arises when transitioning from virtual learning environments—for example, a computer game such as Grand Theft Auto—to the physical world with a self-driving car navigating real streets populated by other cars and people. This leads to serious risks such as a car causing accidents or harming people. For a trading bot, RL can also be completely virtual, that is, in a simulated financial market environment. The major risks that arise from malfunctioning trading bots are financial losses and, on an aggregated level, potential systematic risks due to herding by trading bots. Overall, however, the financial domain seems like an ideal place to train, test, and deploy AI algorithms. Given the rapid developments in the field, it should even be possible for an interested and ambitious student, equipped with a notebook and internet access, to successfully apply AI in a financial trading context. Beyond the hardware and software improve‐ ments over recent years, this is due primarily to the rise of online brokers that supply x | Preface
📄 Page
13
historical and real-time financial data and that allow the execution of financial trades via programmatic APIs. The book is structured in the following six parts. Part I The first part discusses central notions and algorithms of AI in general, such as supervised learning and neural networks (see Chapter 1). It also discusses the concept of superintelligence, which relates to an AI agent that possesses human- level intelligence and, in some domains, superhuman-level intelligence (see Chapter 2). Not every researcher in AI believes that superintelligence is possible in the foreseeable future. However, the discussion of this idea provides a valuable framework for discussing AI in general and AI for finance in particular. Part II The second part consists of four chapters and is about traditional, normative finance theory (see Chapter 3) and how the field is transformed by data-driven finance (see Chapter 4) and machine learning (ML) (see Chapter 5). Taken together, data-driven finance and ML give rise to a model-free, AI-first approach to finance, as discussed in Chapter 6. Part III The third part is about discovering statistical inefficiencies in financial markets by applying deep learning, neural networks, and reinforcement learning. The part covers dense neural networks (DNNs, see Chapter 7), recurrent neural net‐ works (RNNs, see Chapter 8), and algorithms from reinforcement learning (RL, see Chapter 9) that in turn often rely on DNNs to represent and approximate the optimal policy of the AI agent. Part IV The fourth part discusses how to exploit statistical inefficiencies through algo‐ rithmic trading. Topics are vectorized backtesting (see Chapter 10), event-based backtesting and risk management (see Chapter 11), and execution and deploy‐ ment of AI-powered algorithmic trading strategies (see Chapter 12). Part V The fifth part is about the consequences that arise from AI-based competition in the financial industry (see Chapter 13). It also discusses the possibility of a finan‐ cial singularity, a point in time at which AI agents would dominate all aspects of finance as we know it. The discussion in this context focuses on artificial finan‐ cial intelligences as trading bots that consistently generate trading profits above any human or institutional benchmark (see Chapter 14). Preface | xi
📄 Page
14
Part VI The Appendix contains Python code for interactive neural network training (see Appendix A), classes for simple and shallow neural networks that are imple‐ mented from scratch based on plain Python code (see Appendix B), and an example of how to use convolutional neural networks (CNNs) for financial time series prediction (see Appendix C). Author’s Note The application of AI to financial trading is still a nascent field, although at the time of writing there are a number of other books available that cover this topic to some extent. Many of these publications, however, fail to show what it means to economi‐ cally exploit statistical inefficiencies. Some hedge funds already claim to exclusively rely on machine learning to manage their investors’ capital. A prominent example is The Voleon Group, a hedge fund that reported more than $6 billion in assets under management at the end of 2019 (see Lee and Karsh 2020). The difficulty of relying on machine learning to outsmart the finan‐ cial markets is reflected in the fund’s performance of 7% for 2019, a year during which the S&P 500 stock index rose by almost 30%. This book is based on years of practical experience in developing, backtesting, and deploying AI-powered algorithmic trading strategies. The approaches and examples presented are mostly based on my own research since the field is, by nature, not only nascent, but also rather secretive. The exposition and the style throughout this book are relentlessly practical, and in many instances the concrete examples are lacking proper theoretical support and/or comprehensive empirical evidence. This book even presents some applications and examples that might be vehemently criticized by experts in finance and/or machine learning. For example, some experts in machine and deep learning, such as François Chollet (2017), outright doubt that prediction in financial markets is possible. Certain experts in finance, such as Robert Shiller (2015), doubt that there will ever be something like a financial singularity. Others active at the intersection of the two domains, such as Marcos López de Prado (2018), argue that the use of machine learning for financial trading and investing requires an industrial-scale effort with large teams and huge budgets. This book does not try to provide a balanced view of or a comprehensive set of refer‐ ences for all the topics covered. The presentation is driven by the personal opinions and experiences of the author, as well as by practical considerations when providing concrete examples and Python code. Many of the examples are also chosen and tweaked to drive home certain points or to show encouraging results. Therefore, it can certainly be argued that results from many examples presented in the book suffer from data snooping and overfitting (for a discussion of these topics, see Hilpisch 2020, ch. 4). xii | Preface
📄 Page
15
The major goal of this book is to empower the reader to use the code examples in the book as a framework to explore the exciting space of AI applied to financial trading. To achieve this goal, the book relies throughout on a number of simplifying assump‐ tions and primarily on financial time series data and features derived directly from such data. In practical applications, a restriction to financial time series data is of course not necessary—a great variety of other types of data and data sources could be used as well. This book’s approach to deriving features implicitly assumes that financial time series and features derived from them show patterns that, at least to some extent, per‐ sist over time and that can be used to predict the direction of future price movements. Against this background, all examples and code presented in this book are technical and illustrative in nature and do not represent any recommendation or investment advice. For those who want to deploy approaches and algorithmic trading strategies presen‐ ted in this book, my book Python for Algorithmic Trading: From Idea to Cloud Deploy‐ ment (O’Reilly) provides more process-oriented and technical details. The two books complement each other in many respects. For readers who are just getting started with Python for finance or who are seeking a refresher and reference manual, my book Python for Finance: Mastering Data-Driven Finance (O’Reilly) covers a compre‐ hensive set of important topics and fundamental skills in Python as applied to the financial domain. References Papers and books cited in the preface: Chollet, François. 2017. Deep Learning with Python. Shelter Island: Manning. Hilpisch, Yves. 2018. Python for Finance: Mastering Data-Driven Finance. 2nd ed. Sebastopol: O’Reilly. ⸻. 2020. Python for Algorithmic Trading: From Idea to Cloud Deployment. Sebas‐ topol: O’Reilly. Lee, Justina and Melissa Karsh. 2020. “Machine-Learning Hedge Fund Voleon Group Returns 7% in 2019.” Bloomberg, January 21, 2020. https://oreil.ly/TOQiv. López de Prado, Marcos. 2018. Advances in Financial Machine Learning. Hoboken, NJ: John Wiley & Sons. Mnih, Volodymyr et al. 2013. “Playing Atari with Deep Reinforcement Learning.” arXiv. December 19. https://oreil.ly/-pW-1. Preface | xiii
📄 Page
16
Shiller, Robert. 2015. “The Mirage of the Financial Singularity.” Yale Insights. July 16. https://oreil.ly/VRkP3. Silver, David et al. 2016. “Mastering the Game of Go with Deep Neural Networks and Tree Search.” Nature 529 (January): 484-489. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. This element signifies a general note. This element indicates important information. This element indicates a warning or caution. xiv | Preface
📄 Page
17
Using Code Examples You can access and execute the code that accompanies the book on the Quant Plat‐ form at https://aiif.pqp.io, for which only a free registration is required. If you have a technical question or a problem using the code examples, please send an email to bookquestions@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require per‐ mission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example, this book may be attrib‐ uted as: “Artificial Intelligence in Finance by Yves Hilpisch (O’Reilly). Copyright 2021 Yves Hilpisch, 978-1-492-05543-3.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit http://oreilly.com. Preface | xv
📄 Page
18
How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/ai-in-finance. Email bookquestions@oreilly.com to comment or ask technical questions about this book. For news and information about our books and courses, visit http://oreilly.com. Find us on Facebook: http://facebook.com/oreilly Follow us on Twitter: http://twitter.com/oreillymedia Watch us on YouTube: http://www.youtube.com/oreillymedia Acknowledgments I want to thank the technical reviewers—Margaret Maynard-Reid, Dr. Tim Nugent, and Dr. Abdullah Karasan—who did a great job in helping me improve the contents of the book. Delegates of the Certificate Programs in Python for Computational Finance and Algorithmic Trading also helped improve this book. Their ongoing feedback has enabled me to weed out errors and mistakes and refine the code and notebooks used in our online training classes and now, finally, in this book. The same holds true for the team members of The Python Quants and The AI Machine. In particular, Michael Schwed, Ramanathan Ramakrishnamoorthy, and Prem Jebaseelan support me in numerous ways. They are the ones who assist me with the difficult technical problems that arise during the writing of a book like this one. I would also like to thank the whole team at O’Reilly Media—especially Michelle Smith, Corbin Collins, Victoria DeRose, and Danny Elfanbaum—for making it all happen and helping me refine the book in so many ways. Of course, all remaining errors are mine alone. xvi | Preface
📄 Page
19
Furthermore, I would also like to thank the team at Refinitiv—in particular, Jason Ramchandani—for providing ongoing support and access to financial data. The major data files used throughout the book and made available to the readers were received in one way or another from Refinitiv’s data APIs. Of course, everybody making use of artificial intelligence and machine learning today benefits from the achievements and contributions of so many others. Therefore, we should always recall what Sir Isaac Newton wrote in 1675: “If I have seen further it is by standing on the shoulders of Giants.” In that sense, a big thank you to all the researchers and open source maintainers contributing to the field. Finally, special thanks go to my family, who support me all year round in my business and book-writing activities. In particular, I thank my wife Sandra for relentlessly tak‐ ing care of us all and for providing us with a home and environment that we all love so much. I dedicate this book to my lovely wife Sandra and my wonderful son Henry. Preface | xvii
📄 Page
20
(This page has no text content)