Designing Distributed Systems, 2nd Edition Patterns and Paradigms for Scalable, Reliable Systems Using Kubernetes (Brendan Burns) (Z-Library)

D esig ning D istrib uted System s Designing Distributed Systems Patterns and Paradigms for Scalable, Reliable Systems Using Kubernetes Brendan Burns 2nd Edition

9 7 8 1 0 9 8 1 5 6 3 5 0 5 5 9 9 9 ISBN: 978-1-098-15635-0 US $59.99 CAN $74.99 SOF T WARE ARCHITEC TURE Every distributed system strives for reliability, performance, and quality, but building such a system is hard. Establishing a set of design patterns enables software developers and system architects to use a common language to describe their systems and learn from the patterns and practices developed by others. The popularity of containers and Kubernetes paves the way for core distributed system patterns and reusable containerized components. This practical guide presents a collection of repeatable, generic patterns to help guide the systems you build using common patterns and practices drawn from some of the highest performing distributed systems in use today. These common patterns make the systems you build far more approachable and efficient, even if you’ve never built a distributed system before. Author Brendan Burns demonstrates how you can adapt existing software design patterns for designing and building reliable distributed applications. Systems engineers and application developers will learn how these long-established patterns provide a common language and framework for dramatically increasing the quality of your system. This fully updated second edition includes new chapters on AI inference, AI training, and building robust systems for the real world. • Understand how patterns and reusable components enable the rapid development of reliable distributed systems • Use the sidecar, adapter, and ambassador patterns to split your application into a group of containers on a single machine • Explore loosely coupled multinode distributed patterns for replication, scaling, and communication between components • Learn distributed system patterns for large-scale batch data processing covering work queues, event-based processing, and coordinated workflows Designing Distributed Systems “Learning about distributed systems can be hard, but by introducing standard, easy-to-understand-and-use patterns, Brendan Burns makes the task safer, easier, and more approachable.” Anne Currie, CEO, Strategically Green Learning and Development, and author, Building Green Software Brendan Burns is corporate vice president at Microsoft, responsible for Azure management and governance, Azure Arc, Kubernetes on Azure, Linux on Azure, and PowerShell. He lives in Seattle with his wife, two children, and cat, Mrs. Paws.

Praise for Designing Distributed Systems An essential guide for anyone working with scalable and reliable systems, especially in the context of Kubernetes. Burns brings clarity to complex distributed systems concepts and provides practical design patterns, making this book invaluable for engineers looking to build robust systems in modern, cloud native environments. —Rajeev Reddy Vishaka, software engineering leader, Coinbase Designing Distributed Systems by Brendan Burns offers an in-depth exploration of key distributed system concepts, from stateless and sharded services to event-driven processing and observability. A must-read for SREs and engineers looking to harness the full power of Kubernetes to build resilient, high-performance infrastructures. —Swapnil Shevate, site reliability engineering professional and advocate A brilliant resource that simplifies the complexity of distributed systems. Brendan Burns offers practical patterns and design paradigms that are indispensable for building modern cloud native applications. —Lalithkumar Prakashchand, software engineer, Meta Platforms Designing Distributed Systems, 2nd ed., remains an excellent book for introducing developers to architectural concepts that add both resilience and greater efficiency to new and legacy systems. The book describes a set of simple patterns that work particularly well with Kubernetes and are a great place to start building better systems in the field. —Anne Currie, CEO, Strategically Green Learning and Development, and author, Building Green Software

(This page has no text content)

Brendan Burns Designing Distributed Systems Patterns and Paradigms for Scalable, Reliable Systems Using Kubernetes SECOND EDITION

978-1-098-15635-0 [LSI] Designing Distributed Systems by Brendan Burns Copyright © 2025 Brendan Burns. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Louise Corrigan Development Editor: Jill Leonard Production Editor: Elizabeth Faerm Copyeditor: Dwight Ramsey Proofreader: Emily Wydeven Indexer: Potomac Indexing, LLC Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea December 2024: Second Edition Revision History for the Second Edition 2024-12-04: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098156350 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Designing Distributed Systems, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Part I. Foundational Concepts 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 A Brief History of Systems Development 4 A Brief History of Patterns in Software Development 5 Formalization of Algorithmic Programming 5 Patterns for Object-Oriented Programming 5 The Rise of Open Source Software 6 The Value of Patterns, Practices, and Components 6 Standing on the Shoulders of Giants 6 A Shared Language for Discussing Our Practice 7 Shared Components for Easy Reuse 8 Summary 8 2. Important Distributed System Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 APIs and RPCs 9 Latency 10 Reliability 10 Percentiles 11 Idempotency 12 Delivery Semantics 12 Relational Integrity 13 Data Consistency 14 Orchestration and Kubernetes 15 v

Health Checks 16 Summary 16 Part II. Single-Node Patterns 3. The Sidecar Pattern. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 An Example Sidecar: Adding HTTPS to a Legacy Service 22 Dynamic Configuration with Sidecars 23 Modular Application Containers 24 Hands On: Deploying the topz Container 25 Building a Simple PaaS with Sidecars 25 Designing Sidecars for Modularity and Reusability 27 Parameterized Containers 27 Define Each Container’s API 28 Documenting Your Containers 29 Summary 30 4. Ambassadors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Using an Ambassador to Shard a Service 32 Hands On: Implementing a Sharded Redis 33 Using an Ambassador for Service Brokering 35 Using an Ambassador to Do Experimentation or Request Splitting 36 Hands On: Implementing 10% Experiments 37 Summary 39 5. Adapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Monitoring 42 Hands On: Using Prometheus for Monitoring 43 Logging 44 Hands On: Normalizing Different Logging Formats with fluentd 45 Adding a Health Monitor 46 Hands On: Adding Rich Health Monitoring for MySQL 47 Summary 49 Part III. Serving Patterns 6. Replicated Load-Balanced Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Stateless Services 55 Readiness Probes for Load Balancing 56 vi | Table of Contents

Hands On: Creating a Replicated Service in Kubernetes 57 Session Tracked Services 58 Application-Layer Replicated Services 59 Introducing a Caching Layer 59 Deploying Your Cache 60 Hands On: Deploying the Caching Layer 61 Expanding the Caching Layer 63 Rate Limiting and Denial-of-Service Defense 63 SSL Termination 64 Hands On: Deploying nginx and SSL Termination 65 Summary 67 7. Sharded Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Sharded Caching 70 Why You Might Need a Sharded Cache 70 The Role of the Cache in System Performance 71 Replicated Sharded Caches 72 Hands On: Deploying an Ambassador and Memcache for a Sharded Cache 73 An Examination of Sharding Functions 77 Selecting a Key 78 Consistent Hashing Functions 79 Hands On: Building a Consistent HTTP Sharding Proxy 79 Sharded Replicated Serving 80 Hot Sharding Systems 81 Summary 82 8. Scatter/Gather. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Scatter/Gather with Root Distribution 84 Hands On: Distributed Document Search 85 Scatter/Gather with Leaf Sharding 86 Hands On: Sharded Document Search 87 Choosing the Right Number of Leaves 88 Scaling Scatter/Gather for Reliability and Scale 89 Summary 90 9. Functions and Event-Driven Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Determining When FaaS Makes Sense 92 The Benefits of FaaS 92 The Challenges of FaaS 92 The Need for Background Processing 93 The Need to Hold Data in Memory 94 Table of Contents | vii

The Costs of Sustained Request-Based Processing 94 Patterns for FaaS 95 The Decorator Pattern: Request or Response Transformation 95 Hands On: Adding Request Defaulting Prior to Request Processing 96 Handling Events 97 Hands On: Implementing Two-Factor Authentication 98 Event-Based Pipelines 99 Hands On: Implementing a Pipeline for New User Signup 100 Summary 101 10. Ownership Election. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Determining If You Even Need Leader Election 104 The Basics of Leader Election 106 Hands On: Deploying etcd 107 Implementing Locks 108 Hands On: Implementing Locks in etcd 110 Implementing Ownership 111 Hands On: Implementing Leases in etcd 112 Handling Concurrent Data Manipulation 113 Summary 115 Part IV. Batch Computational Patterns 11. Work Queue Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 A Generic Work Queue System 119 The Source Container Interface 120 Work Queue API 121 The Worker Container Interface 122 The Shared Work Queue Infrastructure 124 Hands On: Implementing a Video Thumbnailer 126 Dynamic Scaling of the Workers 127 The Multiworker Pattern 129 Summary 130 12. Event-Driven Batch Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Patterns of Event-Driven Processing 132 Copier 132 Filter 133 Splitter 134 Sharder 135 viii | Table of Contents

Merger 137 Hands On: Building an Event-Driven Flow for New User Signup 138 Publisher/Subscriber Infrastructure 140 Hands On: Deploying Kafka 141 Resiliency and Performance in Work Queues 142 Work Stealing 143 Errors, Priority, and Retry 143 Summary 145 13. Coordinated Batch Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Join (or Barrier Synchronization) 148 Reduce 149 Hands On: Count 150 Sum 151 Histogram 152 Hands On: An Image Tagging and Processing Pipeline 152 Summary 155 Part V. Universal Concepts 14. Monitoring and Observability Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Monitoring and Observability Basics 159 Logging 161 Metrics 162 Basic Request Monitoring 164 Advanced Request Monitoring 165 Alerting 166 Tracing 168 Aggregating Information 169 Summary 170 15. AI Inference and Serving. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 The Basics of AI Systems 171 Hosting a Model 172 Distributing a Model 173 Development with Models 174 Retrieval-Augmented Generation 175 Testing and Deployment 176 Summary 177 Table of Contents | ix

16. Common Failure Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 The Thundering Herd 179 The Absence of Errors Is an Error 180 “Client” and “Expected” Errors 181 Versioning Errors 182 The Myth of Optional Components 182 Oops, We “Cleaned Up” Everything 183 Challenges with the Breadth of Inputs 185 Processing Obsolete Work 187 The “Second System” Problem 188 Summary 190 Conclusion: A New Beginning?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 x | Table of Contents

Preface Who Should Read This Book At this point, nearly every developer is a developer or consumer (or both) of dis‐ tributed systems. Even relatively simple mobile applications are backed with cloud APIs so that their data can be present on whatever device the customer happens to be using. Whether you are new to developing distributed systems or an expert with scars on your hands to prove it, the patterns and components described in this book can transform your development of distributed systems from art to science. Reusable components and patterns for distributed systems will enable you to focus on the core details of your application. This book will help any developer become better, faster, and more efficient at building distributed systems. Why I Wrote This Book Throughout my career as a developer of a variety of software systems, from web search to the cloud, I have built a large number of scalable, reliable distributed sys‐ tems. Each of these systems was, by and large, built from scratch. In general, this is true of all distributed applications. Despite having many of the same concepts and even at times nearly identical logic, the ability to apply patterns or reuse components is often very, very challenging. This forced me to waste time reimplementing systems, and each system ended up less polished than it might have otherwise been. The recent introduction of containers and container orchestrators fundamentally changed the landscape of distributed system development. Suddenly we have an object and interface for expressing core distributed system patterns and building reusable containerized components. I wrote this book to bring together all of the practitioners of distributed systems, giving us a shared language and common stan‐ dard library so that we can all build better systems more quickly. xi

The World of Distributed Systems Today Once upon a time, people wrote programs that ran on one machine and were also accessed from that machine. The world has changed. Now, nearly every application is a distributed system running on multiple machines and accessed by multiple users from all over the world. Despite their prevalence, the design and development of these systems is often a black art practiced by a select group of wizards. But as with everything in technology, the world of distributed systems is advancing, regularizing, and abstracting. In this book I capture a collection of repeatable, generic patterns that can make the development of reliable distributed systems more approachable and efficient. The adoption of patterns and reusable components frees developers from reimplementing the same systems over and over again. This time is then freed to focus on building the core application itself. Navigating This Book This book is organized into five parts as follows: Part I, “Foundational Concepts” Chapters 1 and 2 introduce distributed systems as well as some fundamental con‐ cepts which are essential to understanding the distributed system designs described in Part II, “Single-Node Patterns”. Part II, “Single-Node Patterns” Chapters 3 through 5 discuss reusable patterns and components that occur on individual nodes within a distributed system. They cover the sidecar, adapter, and ambassador single-node patterns. Part III, “Serving Patterns” Chapters 6 through 8 cover multinode distributed patterns for long-running serving systems like web applications. Patterns for many different types of serv‐ ing systems, including basic replication, sharding, and work sharing, are dis‐ cussed. Additionally, Chapters 9 and 10 discuss essential distributed concepts like functions, event-driven programming, and leader election. Part IV, “Batch Computational Patterns” Chapters 11 through 13 cover distributed system patterns for large-scale batch data processing regarding work queues, event-based processing, and coordinated workflows. xii | Preface

Part V, “Universal Concepts” The book concludes with several topics that are universal to all distributed sys‐ tems. Chapter 14 covers logging, monitoring, and alerting for your application; Chapter 15 provides a survey of AI infrastructure; and Chapter 16 describes many common failures and design errors that occur over and over again as we build distributed systems. If you are an experienced distributed systems engineer, you can likely skip Chapters 1 and 2, though you may want to skim them to understand how I expect these patterns to be applied and why I think the general notion of distributed system patterns is so important. You will likely find utility in the single-node patterns, as they are the most generic and most reusable patterns in the book. Depending on your goals and the systems you are interested in developing, you can choose to focus on either large-scale big data patterns or patterns for long-running servers (or both). The two parts are largely independent from each other and can be read in any order. Likewise, if you have extensive distributed system experience, you may find that some of the early patterns chapters (e.g., Part III on naming, discovery, and load balancing) are redundant with what you already know, so feel free to skim through to gain the high-level insights—but don’t forget to look at all of the pretty pictures! Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a general note. Preface | xiii

Online Resources Though this book describes generally applicable distributed system patterns, it expects that readers are familiar with containers and container orchestration systems. If you don’t have a lot of preexisting knowledge about these things, I recommend the following resources: • Docker • Kubernetes • DC/OS Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/brendandburns/designing-distributed-systems. If you have a technical question or a problem using the code examples, please send email to support@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Designing Distributed Systems by Brendan Burns (O’Reilly). Copyright 2025 Brendan Burns, 978-1-098-15635-0.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. xiv | Preface

O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-889-8969 (in the United States or Canada) 707-827-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://oreilly.com/about/contact.html We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/designing-distributed- systems-2e. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media. Watch us on YouTube: https://youtube.com/oreillymedia. Preface | xv

Acknowledgments I’d like to thank my wife, Robin, and my children for everything they do to keep me happy and sane. To all of the people along the way who took the time to help me learn all of these things, many thanks! Also thanks to my parents for that first SE/30. I would also like to thank the technical reviewers who took time to provide their feed‐ back and make this book better: • Dinesh Reddy Chittibala • Anne Currie • Chris Devers • Werner Dijkerman • Sukanya Moorthy • Lalithkumar Prakashchand • William Jamir Silva • Rajeev Reddy Vishaka Finally, I would like to thank the staff at O’Reilly and everyone who provided feed‐ back for the first edition of this book. You’ve helped me make a better book, and I’m grateful. xvi | Preface

PART I Foundational Concepts Before we get started describing distributed systems, there are motivations and con‐ cepts that form the foundation of both why and how we build distributed systems. This section covers these foundational concepts to provide a basis for the rest of the book. Distributed systems don’t exist in a vacuum. The development of such systems is based on the evolving role of both computing and online systems in business and entertainment. In particular, in the development of always-on, mission-critical sys‐ tems that we rely on every day. Additionally, the development of modern distributed systems is based on the history of how such systems have been designed and built in the past. This history of both how the systems are built, and often more importantly how they have failed, has led us to the current containerized and microservice archi‐ tectures that you find in this book. Before the design of distributed systems can be described, it is necessary to have a grounding in core concepts for how server systems operate, as well as fundamental computer science concepts like locking and APIs. It is also necessary to have a grounding in basic operations for distributed systems, such as monitoring and log‐ ging. Finally, because distributed systems involve numerous interactions across many different systems and many different requests, it is necessary to have a basic under‐ standing of statistics and how we can measure the common behavior across the sys‐ tem through observing multiple different requests on different machines.

After reading these introductory chapters, you should have the foundational ground‐ ing in the context, history, and concepts necessary to understand how the design of these systems is described. This grounding also helps explain why some of the seem‐ ingly complex aspects of the design become necessary for reliability or scale.

Statistics

Uploader

Designing Distributed Systems, 2nd Edition Patterns and Paradigms for Scalable, Reliable Systems Using Kubernetes (Brendan Burns) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Recommended for You

Statistics

Uploader

Designing Distributed Systems, 2nd Edition Patterns and Paradigms for Scalable, Reliable Systems Using Kubernetes (Brendan Burns) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Reply to Comment

Edit Comment

Recommended for You