📄 Page
1
(This page has no text content)
📄 Page
2
Efficient MySQL Performance Best Practices and Techniques Daniel Nichter
📄 Page
3
Efficient MySQL Performance by Daniel Nichter Copyright © 2022 Daniel Nichter. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Andy Kwan Development Editor: Corbin Collins Production Editor: Katherine Tozer Copyeditor: Justin Billing Proofreader: Piper Editorial Consulting, LLC Indexer: Amnet Systems LLC Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea December 2021: First Edition Revision History for the First Edition
📄 Page
4
2021-11-30: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098105099 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Efficient MySQL Performance, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-098-10509-9 [LSI]
📄 Page
5
Preface A gap in MySQL literature exists between basic MySQL knowledge and advanced MySQL performance. There are several books about the former, and one book about the latter: High Performance MySQL, 4th Edition, by Silvia Botros and Jeremy Tinley (O’Reilly). This is the first book to bridge the gap. The gap exists because MySQL is complex, and it’s difficult to teach performance without addressing that complexity—the proverbial elephant in the room. But engineers using (not managing) MySQL should not need to become MySQL experts to achieve remarkable MySQL performance. To bridge the gap, this book is unapologetically efficient—pay no attention to the elephant; it’s friendly. Efficient MySQL performance means focus: learning and applying only the best practices and techniques that directly affect remarkable MySQL performance. Focus dramatically narrows the scope of MySQL complexity and allows me to show you a much simpler and faster path through the vast and complex field of MySQL performance. The journey begins with the first sentence of Chapter 1, “Performance is query response time.” From there, we move fast through indexes, data, access patterns, and a lot more. On a scale of one to five—where one is an introduction for anyone, and five is a deep dive for aspiring experts—this book ranges from three to four: deep, but far from the bottom. I presume that you’re an experienced engineer who has basic knowledge of and experience with a relational database (MySQL or otherwise), so I do not explain SQL or database fundamentals. I presume that you’re an accomplished programmer who is responsible for one or more applications that use MySQL, so I continually reference the application and trust that you know the details of your application. I also presume that you’re familiar with computers in general, so I talk freely about hardware, software, networks, and so forth.
📄 Page
6
Since this book focuses on MySQL performance for engineers using MySQL, not managing it, a few references to MySQL configuration are made when necessary but not explained. For help configuring MySQL, ask a DBA where you work. If you don’t have a DBA, hire a MySQL consultant—there are many great consultants with affordable contract options. You can also learn by reading the MySQL Reference Manual. The MySQL manual is superb and experts use it all the time, so you’re in good company. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values determined by context. TIP This element signifies a tip or suggestion.
📄 Page
7
NOTE This element signifies a general note. WARNING This element indicates a warning or caution. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/efficient-mysql-performance. If you have a technical question or a problem using the code examples, please email bookquestions@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Efficient MySQL Performance by Daniel Nichter (O’Reilly). Copyright 2022 Daniel Nichter, 978-1-098-10509-9.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com.
📄 Page
8
O’Reilly Online Learning NOTE For more than 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit http://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/efficient- mysql-performance.
📄 Page
9
Email bookquestions@oreilly.com to comment or ask technical questions about this book. For news and information about our books and courses, visit http://oreilly.com. Follow us on Twitter: http://twitter.com/oreillymedia Watch us on YouTube: http://youtube.com/oreillymedia Acknowledgments Thank you to the MySQL experts who reviewed this book: Vadim Tkachenko, Frédéric Descamps, and Fernando Ipar. Thank you to the MySQL experts who reviewed parts of this book: Marcos Albe, Jean- François Gagné and Kenny Gryp. Thank you to many other MySQL experts who have helped me, taught me, and provided opportunities over the years: Peter Zaitsev, Baron Schwartz, Ryan Lowe, Bill Karwin, Emily Slocombe, Morgan Tocker, Shlomi Noach, Jeremy Cole, Laurynas Biveinis, Mark Callaghan, Domas Mituzas, Ronald Bradford, Yves Trudeau, Sveta Smirnova, Alexey Kopytov, Jay Pipes, Stewart Smith, Aleksandr Kuzminsky, Alexander Rubin, Roman Vynar, and—again—Vadim Tkachenko. Thank you to O’Reilly and my editors: Corbin Collins, Katherine Tozer, Andy Kwan, and all the people behind the scenes. And thank you to my wife, Moon, who supported me during the time- consuming process of writing this book.
📄 Page
10
Chapter 1. Query Response Time Performance is query response time. This book explores that idea from various angles with a single intent: to help you achieve remarkable MySQL performance. Efficient MySQL performance means focusing on the best practices and techniques that directly affect MySQL performance—no superfluous details or deep internals required by DBAs and experts. I presume that you’re a busy professional who is using MySQL, not managing it, and that you need the most results for the least effort. That’s not laziness, that’s efficiency. To that end, this book is direct and to the point. And by the end, you will be able to achieve remarkable MySQL performance. MySQL performance is a complex and multifaceted subject, but you do not need to become an expert to achieve remarkable performance. I narrow the scope of MySQL complexity by focusing on the essentials. MySQL performance begins with query response time. Query response time is how long it takes MySQL to execute a query. Synonymous terms are: response time, query time, execution time, and (inaccurately) query latency. Timing starts when MySQL receives the query and ends when it has sent the result set to the client. Query response time comprises many stages (steps during query execution) and waits (lock waits, I/O waits, and so on), but a complete and detailed breakdown is neither possible nor necessary. As with many systems, basic troubleshooting and analysis reveal the majority of problems. 1
📄 Page
11
NOTE Performance increases as query response time decreases. Improving query response time is synonymous with reducing query response time. This chapter is the foundation. It expounds query response time so that, in subsequent chapters, you can learn how to improve it. There are seven major sections. The first is a true story to motivate and amuse. The second discusses why query response time is the North Star of MySQL performance. The third outlines how query metrics are transformed into meaningful reports: query reporting. The fourth addresses query analysis: using query metrics and other information to understand query execution. The fifth maps out the journey of improving query response time: query optimization. The sixth gives an honest and modest schedule for optimizing queries. The seventh discusses why MySQL cannot simply go faster—why query optimization is necessary. A True Story of False Performance In 2004, I was working the night shift at a data center—2 p.m. to midnight. It was a great job for two reasons. First, the only employees in the data center after 5 p.m. were a handful of engineers monitoring and managing thousands of physical servers for an undisclosed number of customers and websites—probably tens of thousands of websites. It was an engineer’s dream. Second, there were countless MySQL servers that always had problems to fix. It was a gold mine of learning and opportunity. But at the time, there were few books, blogs, or tools about MySQL. (Though that same year, O’Reilly published the first edition of High Performance MySQL.) Consequently, the state of the art for “fixing” MySQL performance problems was “sell the customer more RAM.” For sales and management it always worked, but for MySQL the results were inconsistent.
📄 Page
12
One night I decided not to sell the customer more RAM and, instead, to do a technical deep dive to find and fix the true root cause of their MySQL performance problem. Their database was powering a bulletin board which had slowed to a crawl under the weight of its success—still a common problem today, almost 20 years later. To make a long story short, I found a single query missing a critical index. After properly indexing the query, performance improved dramatically and the website was saved. It cost the customer zero dollars. Not all performance problems and solutions are that straightforward and glamorous. But just shy of 20 years’ experience with MySQL has taught me (and many others) that MySQL performance problems are very often solved by the best practices and techniques in this book. North Star I’m a MySQL DBA and a software engineer, so I know what it’s like working with MySQL as the latter. Especially when it comes to performance, we (software engineers) just want it (MySQL) to work. Between shipping features and putting out fires, who has time for MySQL performance? And when MySQL performance is poor—or worse: when it suddenly becomes poor—the way forward can be difficult to see because there are many considerations: where do we begin? Do we need more RAM? Faster CPUs? More storage IOPS? Is the problem a recent code change? (Fact: code changes deployed in the past can cause performance problems in the future, sometimes days in the future.) Is the problem a noisy neighbor? Are the DBAs doing something to the database? Has the app has gone viral and it’s the good kind of problem to have? As an engineer whose expertise is the application, not MySQL, that situation can be overwhelming. To move forward confidently, start by looking at query response time because it is meaningful and actionable. These are powerful qualities that lead to real solutions: Meaningful
📄 Page
13
Query response time is the only metric anyone truly cares about because, let’s be honest, when the database is fast, nobody looks at it or asks questions. Why? Because query response time is the only metric we experience. When a query takes 7.5 seconds to execute, we experience 7.5 seconds of impatience. That same query might examine a million rows, but we don’t experience a million rows examined. Our time is precious. Actionable There’s so much you can do to improve query response time and make everyone happy again that you’re holding a book about it. (Do people still hold books in the future? I hope so.) Query response time is directly actionable because you own the code, so you can change the queries. Even if you don’t own the code (or have access to it), you can still indirectly optimize query response time. “Improving Query Response Time” addresses direct and indirect query optimization. Focus on improving query response time—the North Star of MySQL performance. Do not begin by throwing hardware at the problem. Begin by using query metrics to determine what MySQL is doing, then analyze and optimize slow queries to reduce response time, and repeat. Performance will improve. Query Reporting Query metrics provide invaluable insights into query execution: response time, lock time, rows examined, and so on. But query metrics, like all metrics, are raw values that need to be collected, aggregated, and reported in a way that’s meaningful to (and readable for) engineers. That’s what this section outlines: how query metric tools transform query metrics into query reports. But query reporting is only a means to an end, as discussed in “Query Analysis”.
📄 Page
14
Looking ahead, query analysis is the real work: analyzing query metrics (as reported) and other information with the goal of understanding query execution. To improve MySQL performance, you must optimize queries. To optimize queries, you must understand how they execute. And to understand that, you must analyze them with pertinent information, including query reports and metadata. But first you need to understand query reporting, since it represents the trove of query metrics that provide invaluable insights into query execution. The next three sections teach you about the following: Sources: query metrics originate from two sources and vary by MySQL distribution and version Aggregation: query metric values are grouped and aggregated by normalized SQL statements Reporting: query reports are organized by a high-level profile and a query-specific report Then you’re ready for “Query Analysis”. NOTE This is not a book about database administration, so this section does not discuss the setup and configuration of query metrics in MySQL. I presume this is already done or will be done. If not, don’t worry: ask your DBA, hire a consultant, or learn how by reading the MySQL manual. Sources Query metrics originate from the slow query log or the Performance Schema. As the names indicate, the former is a log file on disk, and the latter is a database with the same name: performance_schema. Although completely different in nature (log file on disk as opposed to tables in a database), both provide query metrics. The important difference
📄 Page
15
is how many metrics they provide: apart from query response time, which both provide, the number of metrics ranges from 3 to more than 20. NOTE The name slow query log is historical. Long ago, MySQL logged only queries that took greater than N seconds to execute, and the minimum value for N was 1. Old versions of MySQL would not log a query that took 900 milliseconds to execute because that was “fast.” The slow query log really earned its name. Today, the minimum value is zero with a resolution of microseconds. When set to zero, MySQL logs every query executed. Therefore, the name is a little misleading, but now you know why. All things considered, the Performance Schema is the best source of query metrics because it exists in every current version and distribution of MySQL, it works locally and in the cloud, it provides all nine metrics covered in “Query Metrics”, and it’s the most consistent. Plus, the Performance Schema contains a wealth of other data for deep MySQL analysis, so its usefulness extends far beyond query metrics. The slow query log is a good source, too, but it varies considerably: MySQL As of MySQL 8.0.14, enable system variable log_slow_extra and the slow query log provides six of the nine metrics in “Query Metrics”, lacking only Rows_affected, Select_scan, and Select_full_join. It’s still a good source, but use the Performance Schema if possible. Before MySQL 8.0.14, which includes MySQL 5.7, the slow query log is bare bones, providing only Query_time, Lock_time, Rows_sent, and Rows_examined. You can still analyze queries with only these four metrics, but the analysis is much less insightful. Consequently, avoid the slow query log before MySQL 8.0.14 and instead use the Performance Schema. Percona Server
📄 Page
16
Percona Server provides a significantly greater number of metrics in the slow query log when system variable log_slow_verbosity is configured: all nine metrics covered in “Query Metrics” and more. It also supports query sampling (logging a percentage of queries) when system variable log_slow_rate_limit is configured, which is helpful for busy servers. These features make the Percona Server slow query log a great source. See “Slow Query Log” in the Percona Server manual for details. MariaDB Server MariaDB Server 10.x uses the Percona Server slow query log enhancements, but there are two notable differences: system variable log_slow_verbosity is configured differently in MariaDB, and it does not provide metric Rows_affected. Otherwise, it’s essentially the same and a great source, too. See “Slow Query Log Extended Statistics” in the MariaDB knowledge base for details. The slow query log is disabled by default, but you can enable it dynamically (without restarting MySQL). The Performance Schema should be enabled by default, though some cloud providers disable it by default. Unlike the slow query log, the Performance Schema cannot be enabled dynamically—you must restart MySQL to enable it. Make sure the best query metric source is used and properly configured. Ask your DBA, hire a consultant, or learn how by reading the MySQL manual. WARNING The slow query log can log all queries when long_query_time is set to zero, but be careful: on a busy server, this can increase disk I/O and use a significant amount of disk space. Aggregation
📄 Page
17
Query metrics are grouped and aggregated by query. That sounds obvious since they’re called query metrics, but some query metric tools can group by username, hostname, database, and so on. These alternate groupings are exceptionally rare and yield a different type of query analysis, so I don’t address them in this book. Since query response time is the North Star of MySQL performance, grouping query metrics by query is the best way to see which queries have the slowest response time, which forms the basis of query reporting and analysis. There’s one little problem: how do you uniquely identify queries to determine the groups to which they belong? For example, system metrics (CPU, memory, storage, and so on) are grouped by hostname because hostnames are unique and meaningful. But queries don’t have any uniquely identifying properties like hostname. The solution: a SHA-256 hash of the normalized SQL statement. Example 1-1 shows how a SQL statement is normalized. Example 1-1. SQL statement normalization SELECT col FROM tbl WHERE id=1 SELECT `col` FROM `tbl` WHERE `id` = ? f49d50dfab1c364e622d1e1ff54bb12df436be5d44c464a4e25a1ebb80fc2f13 SQL statement (sample) Digest text (normalized SQL statement) Digest hash (SHA-256 of digest text) MySQL normalizes SQL statements to digest texts, then computes the SHA-256 hash of the digest text to yield the digest hash. (It’s not necessary to understand the full process of normalization; it’s sufficient to know that normalization replaces all values with ? and collapses multiple whitespaces to a single space.) Since the digest text is unique, the digest hash is also unique (hash collisions notwithstanding).
📄 Page
18
NOTE The MySQL manual uses the term digest equivocally to mean either digest text or digest hash. Since the digest hash is computed from the digest text, the equivocation is only a language ambiguity, not a technical error. Please allow me to equivocate, too, and use digest to mean either digest text or digest hash when the technical difference doesn’t matter. There is an important shift in terminology in the context of query metrics: the term query changes to be synonymous with digest text. The shift in terminology aligns with the shift in focus: grouping metrics by query. To group by query, query must be unique, which is only true of digests. SQL statements are also called query samples (or samples for short), and they may or may not be reported. For security, most query metric tools discard samples by default (because they contain real values) and report only digest texts and hashes. Samples are required for query analysis because you can EXPLAIN them, which produces metadata necessary for understanding query execution. Some query metric tools EXPLAIN a sample, then discard it, and report the EXPLAIN plan (the output of EXPLAIN). Others only report the sample, which is still very convenient: copy-paste to EXPLAIN. If you have neither, then manually extract samples from the source or manually write them when needed. Two more clarifications about terminology and then I promise we’ll move on to more exciting material. First, terminology varies widely depending on the query metric tool, as shown in Table 1-1.
📄 Page
19
T a b l e 1 - 1 . Q u e r y m e t r i c t e r m i n o l o g y
📄 Page
20
Official (MySQL) Alternatives SQL statement Query Sample Query Digest text Class, family, fingerprint, query Digest hash Class ID, query ID, signature Second, another term that originated from Percona is query abstract: a SQL statement highly abstracted to its SQL command and table list. Example 1-2 is the query abstract for SELECT col FROM tbl WHERE id=1. Example 1-2. Query abstract SELECT tbl Query abstracts are not unique, but they are useful because they’re succinct. Usually, developers only need to see a query abstract to know the full query that it represents. Brevity is the soul of wit. —William Shakespeare It’s important to understand that SQL statements are normalized because the queries you write are not the queries you see. Most of the time, this is not a problem because digest texts closely resemble SQL statements. But the process of normalization raises another important point: do not dynamically generate the same logical query with different syntax, else it will normalize to different digests and be reported as different queries. For example, in the case of a programmatically-generated query that changes the WHERE clause based on user input: SELECT name FROM captains WHERE last_name = 'Picard' SELECT name FROM captains WHERE last_name = 'Picard' AND first_name = 'Jean-Luc'