Reinforcement Learning for Finance (for True Epub) (Yves Hilpisch) (Z-Library)
Author: Yves Hilpisch
科学
Reinforcement learning (RL) has led to several breakthroughs in AI. The use of the Q-learning (DQL) algorithm alone has helped people develop agents that play arcade games and board games at a superhuman level. More recently, RL, DQL, and similar methods have gained popularity in publications related to financial research. This book is among the first to explore the use of reinforcement learning methods in finance.
📄 File Format:
PDF
💾 File Size:
10.4 MB
32
Views
0
Downloads
0.00
Total Donations
📄 Text Preview (First 20 pages)
ℹ️
Registered users can read the full content for free
Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.
📄 Page
1
(This page has no text content)
📄 Page
2
Reinforcement Learning for Finance A Python-Based Introduction Yves Hilpisch
📄 Page
3
Reinforcement Learning for Finance by Yves Hilpisch Copyright © 2025 Yves Hilpisch. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.
📄 Page
4
Acquisitions Editor: Michelle Smith Indexer: Judith McConville Development Editor: Corbin Collins Interior Designer: David Futato Production Editor: Beth Kelly Cover Designer: Karen Montgomery Copyeditor: Doug McNair Illustrator: Kate Dullea Proofreader: Heather Walley October 2024: First Edition Revision History for the First Edition 2024-10-14: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098169145 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Reinforcement Learning for Finance, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
📄 Page
5
The views expressed in this work are those of the author and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-098-16914-5 [LSI]
📄 Page
6
Preface Tell me and I forget. Teach me and I remember. Involve me and I learn. —Benjamin Franklin Reinforcement learning (RL) has enabled a number of breakthroughs in AI. One of the key algorithms in RL is deep Q- learning (DQL) that can be applied to a large number of dynamic decision problems. Popular examples are arcade games and board games, such as Go, in which RL and DQL algorithms have achieved superhuman performance in many instances. This has often happened despite the belief of experts that such feats would be impossible for decades to come. Finance is a discipline with a strong connection between theory and practice. Theoretical advancements often find their way quickly into the applied domain. Many problems in finance are dynamic decision problems, such as the optimal allocation of assets over time. Therefore it is, on the one hand, theoretically interesting to apply DQL to financial problems. On the other hand, it is also in general quite easy and straightforward to apply such algorithms—usually after some thorough testing—in the financial markets.
📄 Page
7
In recent years, financial research has seen a strong growth in publications related to RL, DQL, and related methods applied to finance. However, there is hardly any resource in book form— beyond the purely theoretical ones—for those who are looking for an applied introduction to this exciting field. This book closes the gap in that it provides the required background in a concise fashion and otherwise focuses on the implementation of the algorithms in the form of self-contained Python code and the application to important financial problems. Target Audience This book is intended as a concise, Python-based introduction to the major ideas and elements of RL and DQL as applied to finance. It should be useful to both students and academics as well as to practitioners in search of alternatives to existing financial theories and algorithms. The book expects basic knowledge of the Python programming language, object- oriented programming, and the major Python packages used in data science and machine learning, such as NumPy, pandas, matplotlib, scikit-learn, and TensorFlow.
📄 Page
8
Overview of the Book The book consists of the following chapters: Chapter 1 The first chapter focuses on learning through interaction with four major examples: probability matching, Bayesian updating, RL, and DQL. Chapter 2 The second chapter introduces concepts from dynamic programming (DP) and discusses DQL as an approach to approximate solutions to DP problems. The major theme is the derivation of optimal policies to maximize a given objective function through taking a sequence of actions and updating the optimal policy iteratively. DQL is illustrated on the basis of a DQL agent that learns to play the CartPole game from the Gymnasium Python package. Chapter 3 The third chapter develops a first Finance environment that allows the DQL agent from Chapter 2 to learn a financial prediction game. Although the environment formally replicates the API of the CartPole game, it misses
📄 Page
9
some important characteristics that are needed to apply RL successfully. Chapter 4 The fourth chapter is about data augmentation based on Monte Carlo simulation (MCS) approaches, and it discusses the addition of noise to historical data and the simulation of stochastic processes. Chapter 5 The fifth chapter introduces generative adversarial networks (GANs) to synthetically generate time series data that has statistical characteristics that are similar to those of historical time series data on which a GAN was trained. Chapter 6 Building on the example from Chapter 3, this chapter applies DQL to the problem of algorithmic trading based on the prediction of the next price movement’s direction. Chapter 7 The seventh chapter is about learning optimal dynamic hedging strategies for an option with European exercise in the Black-Scholes-Merton (1973) model. In other words,
📄 Page
10
delta hedging or dynamic replication of the option is the goal. Chapter 8 This chapter applies DQL to three canonical examples in asset management: one risky asset and one risk-free asset, two risky assets, and three risky assets. The problem is to dynamically allocate funds to the available assets to maximize a profit target or a risk-adjusted return (Sharpe ratio). Chapter 9 The ninth chapter is about the optimal liquidation of a large position in a stock. Given a certain risk aversion, the total execution costs are to be minimized. This use case differs from the others in that all actions are tightly connected with each other through an additional constraint. The chapter also introduces an additional RL algorithm in the form of an actor-critic implementation. Chapter 10 The final chapter of the book provides some concluding remarks and sketches out how the examples presented in the book can be improved upon.
📄 Page
11
About the Code in This Book The code in this book is primarily developed using TensorFlow 2.13. Readers can run the code directly on The Python Quants’ Quant Platform with no additional installations required—only a free registration. This platform allows readers to effortlessly execute the code and reproduce the results as presented in the book. The code is also available for download to run locally. Future updates, such as support for newer TensorFlow versions, are planned. Additionally, the Quant Platform offers access to a user forum where readers can ask questions and receive support on all topics related to the book. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width
📄 Page
12
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or with values determined by context. This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution.
📄 Page
13
Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://rl4f.pqp.io. If you have a technical question or a problem using the code examples, please send email to bookquestions@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example, this book would be attributed as
📄 Page
14
“Reinforcement Learning for Finance by Yves Hilpisch (O’Reilly). Copyright 2025 Yves Hilpisch, 978-1-098-16914-5.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit http://oreilly.com.
📄 Page
15
How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-889-8969 (in the United States or Canada) 707-827-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://oreilly.com/about/contact.html We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/RL-for-finance. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media
📄 Page
16
Watch us on YouTube: https://youtube.com/oreillymedia Acknowledgments The contents of this book evolved through a series of online webinars, classes within the CPF Program, and workshops at conferences across Europe and the USA. I extend my sincere thanks to all participants whose valuable feedback helped shape the final version of this work. A special thank you goes to Dr. Ivilina Popova for her insightful feedback on the financial sections and the book as a whole. Her contributions were instrumental in refining the content. I am also grateful to the entire O’Reilly team for their professionalism and ongoing support. Their constructive input and thoughtful suggestions led to significant improvements throughout the manuscript. This book is dedicated to Sandra and Henry. To Sandra, for her unwavering love and support throughout this journey. To Henry, with the hope that this work will inspire him in his studies of data science and artificial intelligence, and fuel his passion for learning.
📄 Page
17
Part I. The Basics The first part of the book covers the basics of reinforcement learning and provides background information. It consists of three chapters: Chapter 1 focuses on learning through interaction with four major examples: probability matching, Bayesian updating, reinforcement learning (RL), and deep Q-learning (DQL). Chapter 2 introduces concepts from dynamic programming (DP) and discusses DQL as an approach to approximate solutions to DP problems. The major theme is the derivation of optimal policies to maximize a given objective function through taking a sequence of actions and updating the optimal policy iteratively. DQL is illustrated based on the CartPole game from the Gymnasium Python package. Chapter 3 develops a first Finance environment that allows the DQL agent from Chapter 2 to learn a financial prediction game. Although the environment formally replicates the API of the CartPole, it misses some important characteristics that are needed to apply RL successfully.
📄 Page
18
Chapter 1. Learning Through Interaction The idea that we learn by interacting with our environment is probably the first to occur to us when we think about the nature of learning. —Sutton and Barto (2018) For human beings and animals alike, learning is almost as fundamental as breathing. It is something that happens continuously and most often unconsciously. There are different forms of learning. The one most important to the topics covered in this book is based on interacting with an environment. Interaction with an environment provides the learner—or agent henceforth—with feedback that can be used to update their knowledge or to refine a skill. In this book, we are mostly interested in learning quantifiable facts about an environment, such as the odds of winning a bet or the reward that an action yields. The next section discusses Bayesian learning as an example of learning through interaction. “Reinforcement Learning”
📄 Page
19
presents breakthroughs in AI that were made possible through RL. It also describes the major building blocks of RL. “Deep Q- Learning” explains the two major characteristics of DQL, which is the most important algorithm in the remainder of the book. Bayesian Learning Two examples illustrate learning by interacting with an environment: tossing a biased coin and rolling a biased die. The examples are based on the idea that an agent betting repeatedly on the outcome of a biased gamble (and remembering all outcomes) can learn bet-by-bet about a gamble’s bias and thereby about the optimal policy for betting. The idea, in that sense, makes use of Bayesian updating. Bayes’ theorem and Bayesian updating date back to the 18th century (Bayes and Price 1763). A modern and Python-based discussion of Bayesian statistics is found in Downey (2021). Tossing a Biased Coin Assume the simple game of betting on the outcome of tossing a biased coin. As a benchmark, consider the special case of an unbiased coin first. Agents are allowed to bet for free on the outcome of the coin tosses. An agent might, for example, bet
📄 Page
20
randomly on either heads or tails. The reward is 1 USD if the agent wins and nothing if the agent loses. The agent’s goal is to maximize the total reward. The following Python code simulates several sequences of 100 bets each: In [1]: import numpy as np from numpy.random import default_rng rng = default_rng(seed=100) In [2]: ssp = [1, 0] In [3]: asp = [1, 0] In [4]: def epoch(): tr = 0 for _ in range(100): a = rng.choice(asp) s = rng.choice(ssp) if a == s: tr += 1 return tr In [5]: rl = np.array([epoch() for _ in range(250 rl[:10] Out[5]: array([56, 47, 48, 55, 55, 51, 54, 43, 55
The above is a preview of the first 20 pages. Register to read the complete e-book.