Statistics
45
Views
0
Downloads
0
Donations
Uploader

高宏飞

Shared on 2025-12-02
Support
Share

AuthorMicha Gorelick, Ian Ozsvald

Your Python code may run correctly, but what if you need it to run faster? This practical book shows you how to locate performance bottlenecks and significantly speed up your code in high-data-volume programs. By explaining the fundamental theory behind design choices, this expanded edition helps experienced Python programmers gain a deeper understanding of Python's implementation.

Tags
No tags
Publisher: O'Reilly Media, Inc.
Publish Year: 2025
Language: 英文
File Format: PDF
File Size: 11.1 MB
Support Statistics
¥.00 · 0times
Text Preview (First 20 pages)
Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

(This page has no text content)
High Performance Python THIRD EDITION Practical Performant Programming for Humans Micha Gorelick and Ian Ozsvald Foreword by Hilary Mason
High Performance Python by Micha Gorelick and Ian Ozsvald Copyright © 2025 Micha Gorelick and Ian Ozsvald. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Michelle Smith Development Editor: Sara Hunter Production Editor: Clare Laylock Copyeditor: Dwight Ramsey Proofreader: Arthur Johnson Indexer: Potomac Indexing, LLC
Interior Designer: David Futato Cover Designer: Jose Marzan Illustrator: Kate Dullea September 2014: First Edition May 2020: Second Edition May 2025: Third Edition Revision History for the Third Edition 2025-04-29: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098165963 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. High Performance Python, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all
responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. High Performance Python is available under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License. 978-1-098-16596-3 [LSI]
Foreword When you think about high performance computing, you might imagine giant clusters of machines modeling complex weather phenomena or trying to understand signals in data collected about far-off stars. It’s easy to assume that only people building specialized systems should worry about the performance characteristics of their code. By picking up this book, you’ve taken a step toward learning the theory and practices you’ll need to write highly performant code. Every programmer can benefit from understanding how to build performant systems. There is an obvious set of applications that are just on the edge of possible, and you won’t be able to approach them without writing optimally performant code. If that’s your practice, you’re in the right place. But there is a much broader set of applications that can benefit from performant code. We often think that new technical capabilities are what drives innovation, but I’m equally fond of capabilities that increase the accessibility of technology by orders of magnitude. When something becomes ten times cheaper in time or compute costs, suddenly the set of applications you can address is wider than you imagined.
The first time this principle manifested in my own work was over a decade ago, when I was working at a social media company, and we ran an analysis over multiple terabytes of data to determine whether people clicked on more photos of cats or dogs on social media. It was dogs, of course. Cats just have better branding. This was an outstandingly frivolous use of compute time and infrastructure at the time! Gaining the ability to apply techniques that had previously been restricted to sufficiently high-value applications, such as fraud detection, to a seemingly trivial question opened up a new world of now-accessible possibilities. We were able to take what we learned from these experiments and build a whole new set of products in search and content discovery. For an example that you might encounter today, consider a machine learning system that recognizes unexpected animals or people in security video footage. A sufficiently performant system could allow you to embed that model into the camera itself, improving privacy or, even if running in the cloud, using significantly less compute and power—benefiting the environment and reducing your operating costs. This can free
up resources for you to look at adjacent problems, potentially building a more valuable system. We all desire to create systems that are effective, easy to understand, and performant. Unfortunately, it often feels like we have to pick only two (or one) out of the three! High Performance Python is a handbook for people who want to make things that are capable of all three. This book stands apart from other texts on the subject in three ways. First, it’s written for us—humans who write code. You’ll find all of the context you need to understand why you might make certain choices. Second, Gorelick and Ozsvald do a wonderful job of curating and explaining the necessary theory to support that context. You’ll also learn the specific quirks of the most useful libraries for implementing these approaches today. This is one of a rare class of programming books that will change the way you think about the practice of programming. I’ve given this book to many people who could benefit from the additional tools it provides. The ideas that you’ll explore in its pages will make you a better programmer, no matter what language or environment you choose to work in.
Enjoy the adventure. Hilary Mason, Cofounder & CEO at Hidden Door
Preface Python is easy to learn. You’re probably here because now that your code runs correctly, you need it to run faster. You like the fact that your code is easy to modify and you can iterate with ideas quickly. The trade-off between easy to develop and runs as quickly as I need is a well-understood and often-bemoaned phenomenon. There are solutions. Some people have serial processes that have to run faster. Others have problems that could take advantage of multicore architectures, clusters, or graphics processing units. Some need scalable systems that can process more or less as expediency and funds allow, without losing reliability. Others will realize that their coding techniques, often borrowed from other languages, perhaps aren’t as natural as examples they see from others. In this book we will cover all of these topics, giving practical guidance for understanding bottlenecks and producing faster and more scalable solutions. We also include some war stories from those who went ahead of you, who took the knocks so you don’t have to.
Python is well suited for rapid development, production deployments, and scalable systems. The ecosystem is full of people who are working to make it scale on your behalf, leaving you more time to focus on the more challenging tasks around you.
Who This Book Is For You’ve used Python for long enough to have an idea about why certain things are slow and to have seen technologies like Cython, numpy , and PyPy being discussed as possible solutions. You might also have programmed with other languages and so know that there’s more than one way to solve a performance problem. While this book is primarily aimed at people with CPU-bound problems, we also look at data transfer and memory-bound solutions. Typically, these problems are faced by scientists, engineers, quants, and academics. We also look at problems that a web developer might face, including the movement of data and the use of just-in-time (JIT) compilers like PyPy and asynchronous I/O for easy-win performance gains. It might help if you have a background in C (or C++, or maybe Java), but it isn’t a prerequisite. Python’s most common interpreter (CPython—the standard you normally get if you type python at the command line) is written in C, and so the hooks and libraries all expose the gory inner C machinery.
There are lots of other techniques that we cover that don’t assume any knowledge of C. You might also have a lower-level knowledge of the CPU, memory architecture, and data buses, but again, that’s not strictly necessary.
Who This Book Is Not For This book is meant for intermediate to advanced Python programmers. Motivated novice Python programmers may be able to follow along as well, but we recommend having a solid Python foundation. We don’t cover storage-system optimization. If you have a SQL or NoSQL bottleneck, then this book probably won’t help you. What You’ll Learn Your authors have been working with large volumes of data, a requirement for I want the answers faster! and a need for scalable architectures, for many years in both industry and academia. We’ll try to impart our hard-won experience to save you from making the mistakes that we’ve made. At the start of each chapter, we’ll list questions that the following text should answer. (If it doesn’t, tell us and we’ll fix it in the next revision!) We cover the following topics:
Background on the machinery of a computer so you know what’s happening behind the scenes Lists and tuples—the subtle semantic and speed differences in these fundamental data structures Dictionaries and sets—memory allocation strategies and access algorithms in these important data structures Iterators—how to write in a more Pythonic way and open the door to infinite data streams using iteration Pure Python approaches—how to use Python and its modules effectively Matrices with numpy —how to use the beloved numpy library like a beast Compilation and just-in-time computing—processing faster by compiling down to machine code, making sure you’re guided by the results of profiling Concurrency—ways to move data efficiently multiprocessing —various ways to use the built-in multiprocessing library for parallel computing and to efficiently share numpy matrices, and some costs and benefits of interprocess communication (IPC) Cluster computing—convert your multiprocessing code to run on a local or remote cluster for both research and production systems
Using less RAM—approaches to solving large problems without buying a humongous computer Lessons from the field—lessons encoded in war stories from those who took the blows so you don’t have to Python 3 Python 3 is the standard version of Python as of 2020, with Python 2.7 deprecated after a 10-year migration process. If you’re still on Python 2.7, you’re doing it wrong—many libraries are no longer supported for your line of Python, and support will become more expensive over time. Please do the community a favor and migrate to Python 3, and make sure that all new projects use Python 3. Unless stated otherwise we’re using Python 3.12 (released 2023). In this book, we use 64-bit Python. While 32-bit Python is supported, it is far less common for scientific work. We’d expect all the libraries to work as usual, but numeric precision, which depends on the number of bits available for counting, is likely to change. 64-bit is dominant in this field, along with *nix environments (often Linux or Mac). 64-bit lets you address larger amounts of RAM. *nix lets you build applications that can
be deployed and configured in well-understood ways with well- understood behaviors. If you’re a Windows user, you’ll have to buckle up. Most of what we show will work just fine, but some things are OS-specific, and you’ll have to research a Windows solution. The biggest difficulty a Windows user might face is the installation of modules: research in sites like Stack Overflow should give you the solutions you need. If you’re on Windows, having a virtual machine (e.g., using VirtualBox) with a running Linux installation might help you to experiment more freely. Windows users should definitely look at a packaged solution like those available through Anaconda, Canopy, Python(x,y), or Sage. These same distributions will make the lives of Linux and Mac users far simpler too. License This book is licensed under Creative Commons Attribution- NonCommercial-NoDerivs 3.0. You’re welcome to use this book for noncommercial purposes, including for noncommercial teaching. The license allows only for complete reproductions; for partial reproductions, please
contact O’Reilly (see “How to Contact Us”). Please attribute the book as noted in the following section. We negotiated that the book should have a Creative Commons license so the contents could spread further around the world. We’d be quite happy to receive a beer if this decision has helped you. We suspect that the O’Reilly staff would feel similarly about the beer. How to Make an Attribution The Creative Commons license requires that you attribute your use of a part of this book. Attribution just means that you should write something that someone else can follow to find this book. The following would be sensible: “High Performance Python, 3rd ed., by Micha Gorelick and Ian Ozsvald (O’Reilly). Copyright 2025 Micha Gorelick and Ian Ozsvald, 978-1-098- 16596-3.” Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://oreil.ly/supp-hpp3e.
If you have a technical question or a problem using the code examples, please email support@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution (see the previous section for attribution). If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com.
Errata and Feedback We encourage you to review this book on public sites like Amazon—please help others understand if they would benefit from this book! You can also email us at feedback@highperformancepython.com. We’re particularly keen to hear about errors in the book, successful use cases where the book has helped you, and high performance techniques that we should cover in the next edition. You can access the web page for this book at https://oreil.ly/supp-hpp3e. Complaints are welcomed through the instant-complaint- transmission-service > /dev/null . Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions.
The above is a preview of the first 20 pages. Register to read the complete e-book.