Architecting for Scale How to Maintain High Availability and Manage Risk in the Cloud (Atchison, Lee) (Z-Library)

Author: Atchison, Lee

科学

Every day, companies struggle to scale critical applications. As traffic volume and data demands increase, these applications become more complicated and brittle, exposing risks and compromising availability. With the popularity of software as a service, scaling has never been more important. Updated with an expanded focus on modern architecture paradigms such as microservices and cloud computing, this practical guide provides techniques for building systems that can handle huge quantities of traffic, data, and demand—without affecting the quality your customers expect. Architects, managers, and directors in engineering and operations organizations will learn how to build applications at scale that run more smoothly and reliably to meet the needs of customers. • Learn how scaling affects the availability of your services, why that matters, and how to improve it • Dive into a modern service-based application architecture that ensures high availability and reduces the effects of service failures • Explore the Single Team Owned Service Architecture paradigm (STOSA)—a model for scaling your development organization in tandem with your application • Understand, measure, and mitigate risk in your systems • Use the cloud to build highly scalable applications

📄 File Format: PDF
💾 File Size: 10.0 MB
20
Views
0
Downloads
0.00
Total Donations

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

📄 Page 1
Lee Atchison Second Edition Architecting for Scale How to Maintain High Availability and Manage Risk in the Cloud
📄 Page 2
(This page has no text content)
📄 Page 3
Praise for Architecting for Scale Don’t bet against your business. Build as if being successful at scale is a foregone conclusion. Architecting for Scale tells you in a no-nonsense way how to go about it. —Colin Bodell, VP Engineering, Shopify Plus; previously VP Website Applications Platform, Amazon.com Architecting for Scale is a definitive guide for directors, managers, and architects who want an actionable roadmap on operating at Scale. —Ken Gavranovic, EVP & GM New Relic; CEO/Founder (Interland, now web.com) Building systems with failure in mind is one of the keys to building highly scaled applications that perform. This book helps you learn this and other techniques to keep your applications performing as your customers—and your company—grow. —Patrick Franklin, EVP & CTO at American Express; previously VP of Engineering, Google This book helps show you how to keep your application performing while it— and your company—scale to meet your customer’s growing needs. —Lew Cirne, CEO, New Relic
📄 Page 4
(This page has no text content)
📄 Page 5
Lee Atchison Architecting for Scale How to Maintain High Availability and Manage Risk in the Cloud SECOND EDITION Boston Farnham Sebastopol TokyoBeijing
📄 Page 6
978-1-492-05717-8 [LSI] Architecting for Scale by Lee Atchison Copyright © 2020 Lee Atchison. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Kathleen Carr Developmental Editor: Amelia Blevins Production Editor: Beth Kelly Copyeditor: Jasmine Kwityn Proofreader: Arthur Johnson Indexer: Ellen Troutman-Zaig Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Rebecca Demarest July 2016: First Edition February 2020: Second Edition Revision History for the Second Edition 2020-02-28: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781492057178 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Architecting for Scale, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
📄 Page 7
To Beth My love, my life, my everything
📄 Page 8
(This page has no text content)
📄 Page 9
Table of Contents Forewords. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Part I. Tenet 1. Availability: Maintaining Availability in Modern Applications 1. Understanding, Measuring, and Improving Your Availability. . . . . . . . . . . . . . . . . . . . . . 3 Availability Versus Reliability 4 What Causes Poor Availability? 5 Measuring Availability 6 The Nines 7 Planned Outages Are Still Outages 8 Availability by the Numbers 8 Improving Your Availability When It Slips 8 Measure and Track Your Current Availability 9 Automate Your Manual Processes 10 Improve Your Systems 14 Keep on Top of Availability in Your Changing and Growing Application 14 Five Focuses to Improve Application Availability 15 Focus #1: Build with Failure in Mind 16 Focus #2: Always Think About Scaling 17 Focus #3: Mitigate Risk 18 Focus #4: Monitor Availability 20 Focus #5: Respond to Availability Issues in a Predictable and Defined Way 21 Being Prepared 22 vii
📄 Page 10
2. Two Mistakes High—Having Room to Recover from Mistakes. . . . . . . . . . . . . . . . . . . 23 Two Mistakes High 24 Scenario #1: Losing a Node 25 Scenario #2: Problems During Upgrades 27 Scenario #3: Data Center Resiliency 28 Scenario #4: Hidden Shared Failure Types 30 Scenario #5: Failure Loops 31 Managing Your Applications 32 The Space Shuttle 32 Part II. Tenet 2. Modern Application Architecture: Using Services 3. Using Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 The Monolith Application Versus the Service-Based Application 37 The Ownership Benefit 40 The Scaling Benefit 42 Splitting into Services 43 What Should Be a Service? 43 Dividing into Services 44 Guideline #1: Specific Business Requirements 44 Guideline #2: Distinct and Separable Team Ownership 45 Guideline #3: Naturally Separable Data 47 Guideline #4: Shared Capabilities/Data 48 Mixed Reasons 49 Going Too Far 49 Finding the Right Balance 50 4. Services and Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Stateless Services—Services Without Data 53 Stateful Services—Services with Data 53 Data Partitioning 54 Timely Handling of Growing Pains 57 5. Dealing with Service Failures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Cascading Service Failures 59 Responding to a Service Failure 61 Predictable Response 61 Understandable Response 62 Reasonable Response 62 Determining Failures 63 Appropriate Action 66 viii | Table of Contents
📄 Page 11
Graceful Degradation 66 Graceful Backoff 66 Fail as Early as Possible 67 Customer-Caused Problems 68 Summary 69 Part III. Tenet 3. Organization: Scaling Your Organization for Modern Applications 6. Service Ownership—STOSA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Single Team Owned Service Architecture 73 Advantages of a STOSA Application and Organization 75 What Does It Mean to “Own” a Service? 76 Using Core Teams and Services 78 Summary 79 7. Service Tiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Application Complexity 81 What Are Service Tiers? 82 Assigning Service Tier Labels to Services 83 Example: Online Store 85 Using Service Tiers 87 Expectations 88 Responsiveness 88 Dependencies 89 Summary 91 8. Service-Level Agreements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 What Are SLAs? 94 External Versus Internal SLAs 96 Why Are Internal SLAs Important? 96 SLAs for Problem Diagnosis 98 Performance Measurements for SLAs 99 Limit SLAs 99 Top Percentile SLAs 100 SLA Conditionals 103 How Many and Which Internal SLAs? 103 Why Internal SLAs Are Important 104 Table of Contents | ix
📄 Page 12
Part IV. Tenet 4. Risk: Risk Management for Modern Applications 9. Using Risk Management When Architecting for Scale. . . . . . . . . . . . . . . . . . . . . . . . . . 107 Identify Risk 107 Remove Worst Offenders 108 Mitigate 108 Review Regularly 109 Managing Risk Summary 109 Likelihood Versus Severity 110 The Top 10 List: Low Likelihood, Low Severity Risk 111 The Order Database: Low Likelihood, High Severity Risk 111 Custom Fonts: High Likelihood, Low Severity Risk 112 T-Shirt Photos: High Likelihood, High Severity Risk 113 The Risk Matrix 114 Scope of the Risk Matrix 116 Creating the Risk Matrix 117 Using the Risk Matrix for Planning 120 Maintaining the Risk Matrix 120 Risk Mitigation 122 Recovery Plans 124 Disaster Recovery Plans 125 Improving Our Risk Situation 125 10. Game Days. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Staging Versus Production Environments 127 Staging/Test Environments 127 Production Environments 129 Concerns with Running Game Days in Production 129 Summary 131 11. Building Systems with Reduced Risk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Technique #1: Introduce Redundancy 134 Idempotent Interfaces 134 Redundancy Improvements That Increase Complexity 136 Technique #2: Understand Independence 136 Technique #3: Manage Security 138 Technique #4: Encourage Simplicity 138 Technique #5: Build in Self-Repair 139 Technique #6: Standardize on Operational Processes 140 Summary 141 x | Table of Contents
📄 Page 13
Part V. Tenet 5. Cloud: Utilizing the Cloud 12. Getting Started Architecting for Scale with the Cloud. . . . . . . . . . . . . . . . . . . . . . . . . . 145 Six Levels of Cloud Maturity 146 Level 1: Experimenting with the Cloud 147 Level 2: Securing the Cloud 147 Level 3: Using Servers and Applications in the Cloud 147 Level 4: Enabling Value-Added Managed Services 148 Level 5: Enabling Cloud-Unique Services 148 Level 6: Cloud All In 149 Organization Versus Application Maturity Level 149 Cloud Adoption Mistakes 149 Trap #1: Not Trusting Cloud Security 150 Trap #2: Performing Cloud Migration via Lift-and-Shift 150 Trap #3: The Lure of Serverless—Depending Too Much on the Hype 151 When and How to Use Multiple Clouds 151 Defining What We Mean by Multiple Clouds 152 Which Model? Which Cloud? 155 The Cloud in Summary 156 13. Five Industry Trends Changed by the Cloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 What Has Changed in the Cloud? 157 Change #1: Acceptance of Microservice-Based Architectures 157 Change #2: Smaller, More Specialized Cloud Services 158 Change #3: Greater Focus on the Application 158 Change #4: The Micro Startup 158 Change #5: Security and Compliance Has Matured 159 Change Continues 159 14. Types of SaaS and Tenancy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Comparing Managed Hosting and Different Types of SaaS 161 Managed Hosting 162 Multi-Tenant SaaS 163 Single-Tenant SaaS 164 Mixing Different Types of SaaS 165 Common SaaS Characteristics 165 SaaS Versus Managed Hosting 165 Summary 166 15. Distributing Your Application in the AWS Cloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 AWS Architecture 168 AWS Region 168 Table of Contents | xi
📄 Page 14
AWS Availability Zone 169 Data Center 169 Architecture Overview 169 Availability Zones Are Not Data Centers 173 Maintaining Location Diversity for Availability Reasons 174 AWS—Mapping Availability Zones in Multiple Accounts 175 Distributing Your Application 176 16. Managed Infrastructure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Structure of Cloud-Based Services 177 Raw Resource 178 Server-Based Managed Resource 180 Serverless Managed Resource 181 Implications of Using Managed Versus Non-Managed Resources 183 Summary 184 17. Cloud Resource Allocation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Usage-Based Resources Allocation 186 Allocated-Capacity Resource Allocation 188 Changing Allocations 189 Automated Allocation of Resource Capacity 190 Issues with Automatic Allocation 190 Dynamic Allocation, Dynamic Cost 192 Pros and Cons of Usage-Based Versus Allocated-Capacity 193 18. Serverless and Functions as a Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Example Application #1: Event Processing 196 Example Application #2: Mobile Backend 197 Example Application #3: Internet of Things Data Intake 197 Advantages and Disadvantages of FaaS 198 Serverless Hype and the Future of FaaS 199 19. Edge Computing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Edge Computing Today 202 Why We Care 203 What Should Be in the Edge Versus the Cloud? 203 How Do We Decide? The Driverless Car 204 Edge Scaling Isn’t the Same as Cloud Scaling 206 Criteria for Using Edge Versus Cloud 208 Eight Keys to Success in the Edge 209 #1: Be Smart About What Goes on the Edge 209 #2: Don’t Ignore DevOps Principles in the Edge 209 xii | Table of Contents
📄 Page 15
#3: Nail a Highly Distributed Deployment Strategy 209 #4: Reduce Versioning as Much as Possible 210 #5: Reduce Per Node Provisioning and Configuration Options 210 #6: Scaling Is an Edge Issue, Not Just a Cloud Issue 211 #7: Nail Monitoring and Analytics 211 #8: The Edge Is Not Magic 211 Edge Computing Overall 212 20. Geographic Impact on Using the Cloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Cloud Matters Everywhere, But at Different Levels 213 Replacement Mentality Impacts How You Adopt Cloud 214 Which Cloud Is Most Important? 215 Important Technologies Differ 216 Data Sovereignty Is Universal 216 My Take 217 Part VI. Conclusion 21. Putting It All Together. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Tenet #1—Availability 221 Tenet #2—Architecture 222 Tenet #3—Organization 222 Tenet #4—Risk 222 Tenet #5—Cloud 223 Architecting for Scale 223 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Table of Contents | xiii
📄 Page 16
(This page has no text content)
📄 Page 17
Forewords Foreword for Second Edition Architecting for Scale is a comprehensive book for managers who realize that all com‐ panies have shifted away from simply calling themselves “digital businesses” and instead now recognize that if they don’t actually operate as one, they will go out of business. Banking, insurance, and other industries that used to have huge moats are being disrupted by upstart companies that deliver amazing experiences because they operate like a digital business rather than merely talking about being a digital business. Architecting for Scale is a definitive guide for directors, managers, and architects who want an actionable roadmap on operating at scale with high reliability, implementing modern operational principles (DevOps, site reliability engineering), as well as how to use current state of the art concepts and services (microservices, cloud, edge). I had the pleasure of working with Lee at New Relic, which enables companies to monitor their digital business across the globe. While at New Relic, Lee traveled around the world, helping companies navigate digital transformation, accelerate ideas into production, and deliver services that were up 100% of the time. Time and time again, I have seen Lee leapfrog companies’ transformation progress in a single thirty-minute meeting. Enjoy the book! It will be impactful to your company and your career! — Ken Gavranovic Former EVP & GM, New Relic CEO/Founder, Interland (now Web.com) xv
📄 Page 18
Foreword for First Edition We are living in interesting times, a software Cambrian explosion if you will, where the cost of building new systems has fallen by orders of magnitude and the connectiv‐ ity of systems has grown by equal orders of magnitude. Resources like Amazon’s AWS, Microsoft’s Azure, and Google’s GCP make it possible for us to physically scale our systems to sizes that we could only have imagined a few years ago. The economics of these resources and seemingly limitless capacity is producing a uniquely rapid radiation of new ideas, new products, and new markets in ways that were never possible before. But all of these new explorations are possible only if the systems we build can scale. While it is easier than ever to build something small, building a system that can scale quickly and reliably proves to be a lot harder than just spinning up more hardware and more storage. Software systems go through a predictable lifecycle starting with small well-crafted solutions fully understood by a single person, through the rapid growth into a mono‐ lith of technical debt, thence fissioning into an ad hoc collection of fragile services, and finally into a well-engineered distributed system able to scale reliably in both breadth (more users) and depth (more features). It’s easy to see what needs to be done from the outside (make it more reliable!) and much harder to see the path from the inside. Fortunately, this book is the essential guidebook for the journey—from availa‐ bility to service tiers, from game days to risk matrices, Lee describes the key decisions and practices for systems that scale. Lee joined me at New Relic when we were first moving from being a single product monolith into being a multiproduct company, all while enjoying the hypergrowth in satisfied customers that made New Relic so successful. Lee came with a lot of experi‐ ence at Amazon, both on the retail side, where they grew a lot, and on the AWS side, where—guess what?—they grew a lot. Lee has been part of teams and led teams and been actively involved in a whole lot of scaling, and he has the scars to prove it. For‐ tunately for us, he’s lived through the mistakes and suffered through fiendishly diffi‐ cult outages and is now passing along those lessons so that we don’t have to get those same scars. When Lee joined New Relic, we were suffering through our awkward teenage fail whale years. Our primitive monolith was suffering from our success, and our availa‐ bility, reliability, and performance were not good. By putting in place the techniques he’s written about in this book, we graduated from those high school years and built the robust enterprise-level service that exists today. One of our tools was establishing four levels of availability engineering: Bronze, Silver, Gold, and Platinum. To earn the Bronze level, a team had to have a risk matrix—it had to have defined SLAs. To earn the Silver level, a team had to be monitoring for the problems identified in the matrix and be using game days; Gold meant that the risks were mitigated; and Platinum was xvi | Forewords
📄 Page 19
like a CMM Level 5 where the systems were self-healing and the focus was on contin‐ uous improvement. We prioritized these efforts for the Tier 1 services first, then the Tier 2 services, etc., and we eventually got everyone to at least Silver and most of the teams through Gold (and a couple to Platinum). When I moved to InVision App, I joined a younger company, again moving through the transition from early success to hypergrowth, and thus I’m driving forward all these same techniques and tools that Lee describes. I urge you, in your journey as part of this exciting explosion of new systems and products and companies, to do the same: to learn from Lee in building your systems for scale. — Bjorn Freeman-Benson, Ph.D. Forewords | xvii
📄 Page 20
(This page has no text content)
The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00
Total Amount (¥)
0
Donation Count

Login to support the author

Login Now
Back to List