Foundations of Scalable Systems Designing Distributed Architectures (Ian Gorton) (Z-Library)

Author: Ian Gorton

科学

In many systems, scalability becomes the primary driver as the user base grows. Attractive features and high utility breed success, which brings more requests to handle and more data to manage. But organizations reach a tipping point when design decisions that made sense under light loads suddenly become technical debt. This practical book covers design approaches and technologies that make it possible to scale an application quickly and cost-effectively. Author Ian Gorton takes software architects and developers through the principles of foundational distributed systems. You'll explore the essential ingredients of scalable solutions, including replication, state management, load balancing, and caching. Specific chapters focus on the implications of scalability for databases, microservices, and event-based streaming systems. You will focus on: • Foundations of scalable systems: Learn basic design principles of scalability, its costs, and architectural tradeoffs • Designing scalable services: Dive into service design, caching, asynchronous messaging, serverless processing, and microservices • Designing scalable data systems: Learn data system fundamentals, NoSQL databases, and eventual consistency versus strong consistency • Designing scalable streaming systems: Explore stream processing systems and scalable event-driven processing

📄 File Format: PDF
💾 File Size: 6.4 MB
21
Views
0
Downloads
0.00
Total Donations

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

📄 Page 1
G orton Ian Gorton Foundations of Scalable Systems Designing Distributed Architectures
📄 Page 2
SOF T WARE ARCHITEC TURE “Building scalable distributed systems is hard. This book just made it easier. —Mark Richards Software Architect, Founder of DeveloperToArchitect.com Foundations of Scalable Systems 9 781098 106065 5 5 9 9 9 US $59.99 CAN $74.99 ISBN: 978-1-098-10606-5 Twitter: @oreillymedia linkedin.com/company/oreilly-media youtube.com/oreillymedia In many systems, scalability becomes the primary driver as the user base grows. Attractive features and high utility breed success, which brings more requests to handle and more data to manage. But organizations reach a tipping point when design decisions that made sense under light loads suddenly become technical debt. This practical book covers design approaches and technologies that make it possible to scale an application quickly and cost-effectively. Author Ian Gorton takes software architects and developers through the foundational principles of distributed systems. You’ll explore the essential ingredients of scalable solutions, including replication, state management, load balancing, and caching. Specific chapters focus on the implications of scalability for databases, microservices, and event-based streaming systems. You’ll focus on: • Foundations of scalable systems: Learn basic design principles of scalability, its costs, concurrency, and architectural trade-offs • Designing scalable services: Dive into service design, caching, asynchronous messaging, serverless processing, and microservices • Designing scalable data systems: Learn data system fundamentals, NoSQL databases, and eventual consistency versus strong consistency • Designing scalable streaming systems: Explore stream processing systems and scalable event-driven processing Ian Gorton has 30 years’ experience as a software architect, computer science professor, and consultant. With a focus on distributed technologies since graduate school, he’s worked on large-scale software systems in areas such as banking, telecommunications, government, healthcare, and scientific modeling and simulation. He’s the author of Essential Software Architecture (Springer) and Data Intensive Computing (Cambridge University Press) and has written more than 200 scientific and professional publications on software architecture and software engineering. G orton
📄 Page 3
Praise for Foundations of Scalable Systems Building scalable distributed systems is hard. This book just made it easier. With topics ranging from concurrency and load balancing to caching and database scaling, you’ll learn the skills necessary to make your systems scale to meet the demands of today’s modern world. —Mark Richards, Software Architect, Founder of DeveloperToArchitect.com Through lively examples and a no-nonsense style, Professor Gorton presents and discusses the principles, architectures, and technologies foundational to scalable distributed systems design. This book serves as an essential modern text for students and practitioners alike. —Anna Liu, Senior Manager, Amazon Web Services The technology in this space is changing all the time, and there is a lot of hype and buzzwords out there. Ian Gorton cuts through that and explains the principles and trade-offs you need to understand to successfully design large-scale software systems. —John Klein, Carnegie Mellon University Software Engineering Institute Scalability is a serious topic in software design, and this book provides a great overview of the many aspects that need to be considered by architects and software engineers. Ian Gorton succeeds in striking an excellent balance between theory and practice, presenting his real-life experience in a way that is immediately useful. His lighthearted writing style makes for an enjoyable and easy read, with the occasional sidetrack to explain things like the link between software architecture and Italian-inspired cuisine. —Eltjo Poort, Architect, CGI
📄 Page 4
In the era of cloud computing, scalability is a system characteristic that is easy to take for granted until you find your system hasn’t got it. In this book, Dr. Ian Gorton draws on his wide practical, research, and teaching experience to explain scalability in a very accessible way and provide a thorough introduction to the technologies and techniques that are used to achieve it. It is likely to save its readers from a lot of painful learning experiences when they find that they need to build a highly scalable system! —Dr. Eoin Woods, CTO, Endava Dealing with issues of distributed systems, microservice architecture, serverless architecture, and distributed databases makes creating a system that can scale to support tens of thousands of users extremely challenging. Ian Gorton has clearly laid out the issues and given a developer the tools they need to contribute to the development of a system that can scale. —Len Bass, Carnegie Mellon University Trade-offs are key to a distributed system. Professor Gorton puts out great explanations with real-life scenarios for distributed systems and other key related areas, which will help you develop a trade-off mindset for making better decisions. —Vishal Rajpal, Senior Software Development Engineer, Amazon This is the book to read, whether you’re a distributed systems learner or an experienced software engineer. Dr. Gorton brings together his decades of academic research and cloud industry case studies to equip you with the key knowledge and skills you need to build scalable systems and succeed in the cloud computing era. —Cong Li, Software Engineer, Microsoft
📄 Page 5
Ian Gorton Foundations of Scalable Systems Designing Distributed Architectures Boston Farnham Sebastopol TokyoBeijing
📄 Page 6
978-1-098-10606-5 [LSI] Foundations of Scalable Systems by Ian Gorton Copyright © 2022 Ian Gorton. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Melissa Duffield Development Editor: Virginia Wilson Production Editor: Jonathon Owen Copyeditor: Justin Billing Proofreader: nSight, Inc. Indexer: nSight, Inc. Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea July 2022: First Edition Revision History for the First Edition 2022-06-29: First Release See https://oreil.ly/scal-sys for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Foundations of Scalable Systems, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
📄 Page 7
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Part I. The Basics 1. Introduction to Scalable Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 What Is Scalability? 3 Examples of System Scale in the Early 2000s 6 How Did We Get Here? A Brief History of System Growth 7 Scalability Basic Design Principles 9 Scalability and Costs 11 Scalability and Architecture Trade-Offs 13 Performance 13 Availability 14 Security 15 Manageability 16 Summary and Further Reading 16 2. Distributed Systems Architectures: An Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Basic System Architecture 19 Scale Out 21 Scaling the Database with Caching 23 Distributing the Database 25 Multiple Processing Tiers 27 Increasing Responsiveness 30 Systems and Hardware Scalability 32 Summary and Further Reading 34 v
📄 Page 8
3. Distributed Systems Essentials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Communications Basics 35 Communications Hardware 36 Communications Software 39 Remote Method Invocation 43 Partial Failures 49 Consensus in Distributed Systems 53 Time in Distributed Systems 56 Summary and Further Reading 58 4. An Overview of Concurrent Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Why Concurrency? 62 Threads 64 Order of Thread Execution 67 Problems with Threads 68 Race Conditions 69 Deadlocks 73 Thread States 78 Thread Coordination 79 Thread Pools 82 Barrier Synchronization 84 Thread-Safe Collections 86 Summary and Further Reading 88 Part II. Scalable Systems 5. Application Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Service Design 93 Application Programming Interface (API) 94 Designing Services 97 State Management 100 Applications Servers 103 Horizontal Scaling 106 Load Balancing 107 Load Distribution Policies 109 Health Monitoring 109 Elasticity 110 Session Affinity 111 Summary and Further Reading 113 vi | Table of Contents
📄 Page 9
6. Distributed Caching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Application Caching 115 Web Caching 120 Cache-Control 121 Expires and Last-Modified 121 Etag 122 Summary and Further Reading 124 7. Asynchronous Messaging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Introduction to Messaging 128 Messaging Primitives 128 Message Persistence 130 Publish–Subscribe 131 Message Replication 132 Example: RabbitMQ 133 Messages, Exchanges, and Queues 133 Distribution and Concurrency 135 Data Safety and Performance Trade-offs 138 Availability and Performance Trade-Offs 140 Messaging Patterns 141 Competing Consumers 141 Exactly-Once Processing 142 Poison Messages 143 Summary and Further Reading 144 8. Serverless Processing Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 The Attractions of Serverless 147 Google App Engine 149 The Basics 149 GAE Standard Environment 149 Autoscaling 150 AWS Lambda 152 Lambda Function Life Cycle 153 Execution Considerations 154 Scalability 155 Case Study: Balancing Throughput and Costs 157 Choosing Parameter Values 158 GAE Autoscaling Parameter Study Design 159 Results 160 Summary and Further Reading 161 Table of Contents | vii
📄 Page 10
9. Microservices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 The Movement to Microservices 164 Monolithic Applications 164 Breaking Up the Monolith 166 Deploying Microservices 168 Principles of Microservices 170 Resilience in Microservices 172 Cascading Failures 173 Bulkhead Pattern 178 Summary and Further Reading 180 Part III. Scalable Distributed Databases 10. Scalable Database Fundamentals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Distributed Databases 185 Scaling Relational Databases 186 Scaling Up 186 Scaling Out: Read Replicas 188 Scale Out: Partitioning Data 189 Example: Oracle RAC 191 The Movement to NoSQL 192 NoSQL Data Models 196 Query Languages 197 Data Distribution 198 The CAP Theorem 202 Summary and Further Reading 203 11. Eventual Consistency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 What Is Eventual Consistency? 205 Inconsistency Window 206 Read Your Own Writes 207 Tunable Consistency 209 Quorum Reads and Writes 211 Replica Repair 213 Active Repair 214 Passive Repair 214 Handling Conflicts 215 Last Writer Wins 216 Version Vectors 217 Summary and Further Reading 221 viii | Table of Contents
📄 Page 11
12. Strong Consistency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Introduction to Strong Consistency 224 Consistency Models 226 Distributed Transactions 227 Two-Phase Commit 228 2PC Failure Modes 230 Distributed Consensus Algorithms 232 Raft 234 Leader Election 236 Strong Consistency in Practice 238 VoltDB 238 Google Cloud Spanner 241 Summary and Further Reading 244 13. Distributed Database Implementations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Redis 248 Data Model and API 248 Distribution and Replication 250 Strengths and Weaknesses 251 MongoDB 253 Data Model and API 254 Distribution and Replication 256 Strengths and Weaknesses 259 Amazon DynamoDB 260 Data Model and API 261 Distribution and Replication 264 Strengths and Weaknesses 266 Summary and Further Reading 267 Part IV. Event and Stream Processing 14. Scalable Event-Driven Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Event-Driven Architectures 272 Apache Kafka 274 Topics 275 Producers and Consumers 276 Scalability 279 Availability 283 Summary and Further Reading 284 Table of Contents | ix
📄 Page 12
15. Stream Processing Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Introduction to Stream Processing 288 Stream Processing Platforms 291 Case Study: Apache Flink 293 DataStream API 293 Scalability 295 Data Safety 298 Conclusions and Further Reading 300 16. Final Tips for Success. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Automation 304 Observability 305 Deployment Platforms 306 Data Lakes 307 Further Reading and Conclusions 307 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 x | Table of Contents
📄 Page 13
Preface This book is built around the thesis that the ability of software systems to operate at scale is increasingly a key factor that defines success. As our world becomes more interconnected, this characteristic will only become more prevalent. Hence, the goal of this book is to provide the reader with the core knowledge of distributed and con‐ current systems. It also introduces a collection of software architecture approaches and distributed technologies that can be used to build scalable systems. Why Scalability? The pace of change in our world is daunting. Innovations appear daily, creating new capabilities for us all to interact, conduct business, entertain ourselves…even end pandemics. The fuel for much of this innovation is software, written by veritable armies of developers in major internet companies, crack small teams in startups, and all shapes and sizes of teams in between. Delivering software systems that are responsive to user needs is difficult enough, but it becomes an order of magnitude more difficult to do for systems at scale. We all know of systems that fail suddenly when exposed to unexpected high loads—such situations are (in the best cases) bad publicity for organizations, and at worst can result in lost jobs or destroyed companies. Software is unlike physical systems in that it’s amorphous—its physical form (1s and 0s) bears no resemblance to its actual capabilities. We’d never expect to transform a small village of 500 people into a city of 10 million overnight. But we sometimes expect our software systems to suddenly handle one thousand times the number of requests they were designed for. Unsurprisingly, the outcomes are rarely pretty. xi
📄 Page 14
Who This Book Is For The major target audience for this book is software engineers and architects who have zero or limited experience with distributed, concurrent systems. They need to deepen both their theoretical and practical design knowledge in order to meet the challenges of building larger-scale, typically internet-facing applications. What You Will Learn This book covers the landscape of concurrent and distributed systems through the lens of scalability. While it’s impossible to totally divorce scalability from other archi‐ tectural qualities, scalability is the main focus of discussion. Of course, other qualities necessarily come into play, with performance, availability, and consistency regularly raising their heads. Building distributed systems requires some fundamental understanding of distribu‐ tion and concurrency—this knowledge is a recurrent theme throughout this book. It’s needed because of the two essential problems in distributed systems that make them complex, as I describe below. First, although systems as a whole operate perfectly correctly nearly all the time, an individual part of the system may fail at any time. When a component fails (whether due to a hardware crash, network outage, bug in a server, etc.), we have to employ techniques that enable the system as a whole to continue operations and recover from failures. Every distributed system will experience component failure, often in weird, mysterious, and unanticipated ways. Second, creating a scalable distributed system requires the coordination of multiple moving parts. Each component of the system needs to keep its part of the bargain and process requests as quickly as possible. If just one component causes requests to be delayed, the whole system may perform poorly and even eventually crash. There is a rich body of literature available to help you deal with these problems. Luckily for us engineers, there’s also an extensive collection of technologies that are designed to help us build distributed systems that are tolerant to failure and scalable. These technologies embody theoretical approaches and complex algorithms that are incredibly hard to build correctly. Using these platform-level, widely applicable technologies, our applications can stand on the shoulders of giants, enabling us to build sophisticated business solutions. Specifically, readers of this book will learn: • The fundamental characteristics of distributed systems, including state manage‐ ment, time coordination, concurrency, communications, and coordination xii | Preface
📄 Page 15
• Architectural approaches and supporting technologies for building scalable, robust services • How distributed databases operate and can be used to build scalable distributed systems • Architectures and technologies such as Apache Kafka and Flink for building streaming, event-based systems Note for Educators Much of the content of this book has been developed in the context of an advanced undergraduate/graduate course at Northeastern University. It has proven a very pop‐ ular and effective approach for equipping students with the knowledge and skills needed to launch their careers with major internet companies. Additional materials on the book website are available to support educators who wish to use the book for their course. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a general note. Preface | xiii
📄 Page 16
This element indicates a warning or caution. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://oreil.ly/fss-git-repo. If you have a technical question or a problem using the code examples, please send email to bookquestions@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Foundations of Scalable Solutions by Ian Gorton (O’Reilly). Copyright 2022 Ian Gorton, 978-1-098-10606-5.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. xiv | Preface
📄 Page 17
How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/scal-sys. Email bookquestions@oreilly.com to comment or ask technical questions about this book. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media Follow us on Twitter: https://twitter.com/oreillymedia Watch us on YouTube: https://www.youtube.com/oreillymedia Acknowledgments None of this work would ever have happened without the inspiration afforded to me by my graduate school advisor, Professor Jon Kerridge. His boundless enthusiasm has fueled me in this work for three decades. Matt Bass and John Klein from Carnegie Mellon University were invaluable resources in the early stages of this project. I thank them for the great discussions about the whole spectrum of scalable software architectures. My reviewers have been excellent—diligent and insightful—and have kept me on the right track. Eternal gratitude is due to Mark Richards, Matt Stine, Thiyagu Palanisamy, Jess Males, Orkhan Huseynli, Adnan Rashid, and Nirav Aga. And many thanks to Virginia Wilson for fixing my wonky words! I’d also like to thank all my students, and especially Ruijie Xiao, in the CS6650 Building Scalable Distributed Systems course at Northeastern University in Seattle. You’ve provided me with invaluable feedback on how best to communicate the many complex concepts covered in this book. You are the best guinea pigs ever! Preface | xv
📄 Page 18
(This page has no text content)
📄 Page 19
PART I The Basics The first four chapters in Part I of this book advocate the need for scalability as a key architectural attribute in modern software systems. These chapters provide broad coverage of the basic mechanisms for achieving scalability, the fundamental charac‐ teristics of distributed systems, and an introduction to concurrent programming. This knowledge lays the foundation for what follows, and if you are new to the areas of distributed, concurrent systems, you’ll need to spend some time on these chapters. They will make the rest of the book much easier to digest.
📄 Page 20
(This page has no text content)
The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00
Total Amount (¥)
0
Donation Count

Login to support the author

Login Now
Back to List