Statistics
29
Views
0
Downloads
0
Donations
Uploader

高宏飞

Shared on 2025-12-20
Support
Share

AuthorKelly Shortridge, Aaron Rinehart

Cybersecurity is broken. Year after year, attackers remain unchallenged and undeterred, while engineering teams feel pressure to design, build, and operate "secure" systems. Failure can't be prevented, mental models of systems are incomplete, and our digital world constantly evolves. How can we verify that our systems behave the way we expect? What can we do to improve our systems' resilience? In this comprehensive guide, authors Kelly Shortridge and Aaron Rinehart help you navigate the challenges of sustaining resilience in complex software systems by using the principles and practices of security chaos engineering. By preparing for adverse events, you can ensure they don't disrupt your ability to innovate, move quickly, and achieve your engineering and business goals. Learn how to design a modern security program Make informed decisions at each phase of software delivery to nurture resilience and adaptive capacity Understand the complex systems dynamics upon which resilience outcomes depend Navigate technical and organizational trade-offs that distort decision making in systems Explore chaos experimentation to verify critical assumptions about software quality and security Learn how major enterprises leverage security chaos engineering

Tags
No tags
ISBN: 1098113829
Publisher: O'Reilly Media
Publish Year: 2023
Language: 英文
Pages: 428
File Format: PDF
File Size: 14.2 MB
Support Statistics
¥.00 · 0times
Text Preview (First 20 pages)
Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

Kelly Shortridge with Aaron Rinehart Security Chaos Engineering Sustaining Resilience in Software and Systems
SECURIT Y ENGINEERING “This book brings the reader in on a well-kept secret: security is more about people and processes than about technology. It is our mental models of those elements that drive our efforts and outcomes.” —Bob Lord former Chief Security Officer of the DNC “Security Chaos Engineering is a must read for technology leaders and engineers today, as we operate increasingly complex systems.” —Dr. Nicole Forsgren Lead author of Accelerate and Partner at Microsoft Research “Kelly and Aaron bring immense cross-domain, real-world experience to complexity in software ecosystems in a way that all security professionals should find accessible and fascinating.” —Phil Venables Chief Information Security Officer, Google Cloud Security Chaos Engineering Twitter: @oreillymedia linkedin.com/company/oreilly-media youtube.com/oreillymedia Cybersecurity is broken. Year after year, attackers remain unchallenged and undeterred, while engineering teams feel pressure to design, build, and operate “secure” systems. Failure can’t be prevented, mental models of systems are incomplete, and our digital world constantly evolves. How can we verify that our systems behave the way we expect? What can we do to improve our systems’ resilience? In this comprehensive guide, authors Kelly Shortridge and Aaron Rinehart help you navigate the challenges of sustaining resilience in complex software systems by using the principles and practices of security chaos engineering. By preparing for adverse events, you can ensure they don’t disrupt your ability to innovate, move quickly, and achieve your engineering and business goals. • Learn how to design a modern security program • Make informed decisions at each phase of software delivery to nurture resilience and adaptive capacity • Understand the complex systems dynamics upon which resilience outcomes depend • Navigate technical and organizational trade-offs that distort decision making in systems • Explore chaos experimentation to verify critical assumptions about software quality and security • Learn how major enterprises leverage security chaos engineering Kelly Shortridge is a senior principal engineer at Fastly whose career has been dedicated to bringing cybersecurity out of the dark ages. Aaron Rinehart is a senior distinguished engineer, SRE and Chaos Engineering at CapitalOne. US $65.99 CAN $82.99 ISBN: 978-1-098-11382-7
Praise for Security Chaos Engineering Security Chaos Engineering is a must read for technology leaders and engineers today, as we operate increasingly complex systems. Security Chaos Engineering presents clear evidence that systems resilience is a shared goal of both ops and security teams, and showcases tools and frameworks to measure, design, and instrument systems to improve the resilience and security of our systems. 10/10 strong recommend (kidding but also not). —Dr. Nicole Forsgren, lead author of Accelerate and partner at Microsoft Research Shortridge weaves multiple under-served concepts into the book’s guidance, like recognizing human biases, the power of rehearsals, org design, complex systems, systems thinking, habits, design thinking, thinking like a product manager and a financial planner, and much more. This book brings the reader in on a well-kept secret: security is more about people and processes than about technology. It is our mental models of those elements that drive our efforts and outcomes. —Bob Lord, former Chief Security Officer of the DNC and former Chief Information Security Officer of Yahoo As our societies become more digitized then our software ecosystems are becoming ever more complex. While complexity can be considered the enemy of security, striving for simplicity as the sole tactic is not realistic. Rather, we need to manage complexity and a big part of that is chaos engineering. That is testing, probing, modeling, and nudging complex systems to a better state. This is tough, but Kelly and Aaron bring immense cross-domain, practical real-world experience to this area in a way that all security professionals should find accessible and fascinating. —Phil Venables, Chief Information Security Officer, Google Cloud
Security Chaos Engineering provides a much-needed reframing of cybersecurity that moves it away from arcane rules and rituals, replacing them with modern concepts from software and resiliency engineering. If you are looking for ways to uplift your security approaches and engage your whole engineering team in the process, this book is for you. —Camille Fournier, engineering leader and author, The Manager’s Path We as defenders owe it to ourselves to make life as hard for attackers as possible. This essential work expertly frames this journey succinctly and clearly and is a must read for all technology leaders and security practitioners, especially in our cloud native world. —Rob Duhart, Jr., VP, Deputy Chief Information Security Officer and Chief Information Security Officer eCommerce at Walmart Security Chaos Engineering is an unflinching look at how systems are secured in the real world. Shortridge understands both the human and the technical elements in security engineering. —George Neville-Neil, author of the Kode Vicious column in ACM Queue Magazine Security masquerades as a technical problem, but it really cuts across all layers: organizational, cultural, managerial, temporal, historical, and technical. You can’t even define security without thinking about human expectations, and the dividing line between “flaw” and “vulnerability” is non-technical. This thought-provoking book emphasizes the inherent complexity of security and the need for flexible and adaptive approaches that avoid both box-ticking and 0day-worship. —Thomas Dullien, founder, security researcher, and performance engineer
Kelly Shortridge with Aaron Rinehart Security Chaos Engineering Sustaining Resilience in Software and Systems Boston Farnham Sebastopol TokyoBeijing
978-1-098-11382-7 [LSI] Security Chaos Engineering by Kelly Shortridge with Aaron Rinehart Copyright © 2023 Aaron Rinehart and Kelly Shortridge. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: John Devins Development Editor: Michele Cronin Production Editor: Clare Laylock Copyeditor: Nicole Taché Proofreader: Audrey Doyle Indexer: Sue Klefstad Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea Chapter-Opener Image Designer: Savannah Glitschka March 2023: First Edition Revision History for the First Edition 2023-03-30: First Release 2023-07-07: Second Release See http://oreilly.com/catalog/errata.csp?isbn=9781098113827 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Security Chaos Engineering, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1. Resilience in Software and Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 What Is a Complex System? 2 Variety Defines Complex Systems 3 Complex Systems Are Adaptive 4 The Holistic Nature of Complex Systems 5 What Is Failure? 6 Acute and Chronic Stressors in Complex Systems 7 Surprises in Complex Systems 9 What Is Resilience? 11 Critical Functionality 12 Safety Boundaries (Thresholds) 13 Interactions Across Space-Time 15 Feedback Loops and Learning Culture 16 Flexibility and Openness to Change 18 Resilience Is a Verb 20 Resilience: Myth Versus Reality 21 Myth: Robustness = Resilience 21 Myth: We Can and Should Prevent Failure 22 Myth: The Security of Each Component Adds Up to Resilience 24 Myth: Creating a “Security Culture” Fixes Human Error 25 Chapter Takeaways 28 2. Systems-Oriented Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Mental Models of System Behavior 32 How Attackers Exploit Our Mental Models 34 v
Refining Our Mental Models 35 Resilience Stress Testing 35 The E&E Resilience Assessment Approach 39 Evaluation: Tier 1 Assessment 39 Mapping Flows to Critical Functionality 40 Document Assumptions About Safety Boundaries 41 Making Attacker Math Work for You 42 Starting the Feedback Flywheel with Decision Trees 59 Moving Toward Tier 2: Experimentation 59 Experimentation: Tier 2 Assessment 60 The Value of Experimental Evidence 60 Sustaining Resilience Assessments 63 Fail-Safe Versus Safe-to-Fail 64 Uncertainty Versus Ambiguity 67 Fail-Safe Neglects the Systems Perspective 68 The Fragmented World of Fail-Safe 69 SCE Versus Security Theater 71 What Is Security Theater? 72 How Does SCE Differ from Security Theater? 72 How to RAVE Your Way to Resilience 75 Repeatability: Handling Complexity 75 Accessibility: Making Security Easier for Engineers 76 Variability: Supporting Evolution 77 Chapter Takeaways 79 3. Architecting and Designing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 The Effort Investment Portfolio 84 Allocating Your Effort Investment Portfolio 85 Investing Effort Based on Local Context 86 The Four Failure Modes Resulting from System Design 89 The Two Key Axes of Resilient Design: Coupling and Complexity 91 Designing to Preserve Possibilities 91 Coupling in Complex Systems 93 The Tight Coupling Trade-Off 96 The Dangers of Tight Coupling: Taming the Forest 98 Investing in Loose Coupling in Software Systems 100 Chaos Experiments Expose Coupling 106 Complexity in Complex Systems 108 Understanding Complexity: Essential and Accidental 109 Complexity and Mental Models 111 Introducing Linearity into Our Systems 113 vi | Table of Contents
Designing for Interactivity: Identity and Access Management 121 Navigating Flawed Mental Models 123 Chapter Takeaways 126 4. Building and Delivering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Mental Models When Developing Software 130 Who Owns Application Security (and Resilience)? 131 Lessons We Can Learn from Database Administration Going DevOps 132 Decisions on Critical Functionality Before Building 134 Defining System Goals and Guidelines on “What to Throw Out the Airlock” 136 Code Reviews and Mental Models 136 “Boring” Technology Is Resilient Technology 138 Standardization of Raw Materials 140 Developing and Delivering to Expand Safety Boundaries 144 Anticipating Scale and SLOs 145 Automating Security Checks via CI/CD 146 Standardization of Patterns and Tools 151 Dependency Analysis and Prioritizing Vulnerabilities 154 Observe System Interactions Across Space-Time (or Make More Linear) 159 Configuration as Code 160 Fault Injection During Development 162 Integration Tests, Load Tests, and Test Theater 163 Beware Premature and Improper Abstractions 177 Fostering Feedback Loops and Learning During Build and Deliver 180 Test Automation 180 Documenting Why and When 183 Distributed Tracing and Logging 187 Refining How Humans Interact with Build and Delivery Practices 192 Flexibility and Willingness to Change 193 Iteration to Mimic Evolution 193 Modularity: Humanity’s Ancient Tool for Resilience 195 Feature Flags and Dark Launches 198 Preserving Possibilities for Refactoring: Typing 199 The Strangler Fig Pattern 201 Chapter Takeaways 205 5. Operating and Observing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 What Does Operating and Observing Involve? 210 Operational Goals in SCE 211 The Overlap of SRE and Security 212 Table of Contents | vii
Measuring Operational Success 214 Crafting Success Metrics like Attackers 215 The DORA Metrics 217 SLOs, SLAs, and Principled Performance Analytics 220 Embracing Confidence-Based Security 222 Observability for Resilience and Security 223 Thresholding to Uncover Safety Boundaries 227 Attack Observability 228 Scalable Is Safer 231 Navigating Scalability 233 Automating Away Toil 234 Chapter Takeaways 235 6. Responding and Recovering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Responding to Surprises in Complex Systems 240 Incident Response and the Effort Investment Portfolio 241 Action Bias in Incident Response 242 Practicing Response Activities 244 Recovering from Surprises 247 Blameless Culture 250 Blaming Human Error 253 Hindsight Bias and Outcome Bias 257 The Just-World Hypothesis 259 Neutral Practitioner Questions 261 Chapter Takeaways 263 7. Platform Resilience Engineering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Production Pressures and How They Influence System Behavior 267 What Is Platform Engineering? 272 Defining a Vision 276 Defining a User Problem 277 Local Context Is Critical 278 User Personas, Stories, and Journeys 279 Understanding How Humans Make Trade-Offs Under Pressure 282 Designing a Solution 286 The Ice Cream Cone Hierarchy of Security Solutions 286 System Design and Redesign to Eliminate Hazards 289 Substitute Less Hazardous Methods or Materials 291 Incorporate Safety Devices and Guards 294 Provide Warning and Awareness Systems 295 Apply Administrative Controls Including Guidelines and Training 299 viii | Table of Contents
Two Paths: The Control Strategy or the Resilience Strategy 305 Experimentation and Feedback Loops for Solution Design 310 Implementing a Solution 311 Fostering Consensus 311 Planning for Migration 313 Success Metrics 313 Chapter Takeaways 317 8. Security Chaos Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Lessons Learned from Early Adopters 323 Lesson #1. Start in Nonproduction Environments; You Can Still Learn a Lot 323 Lesson #2. Use Past Incidents as a Source of Experiments 324 Lesson #3. Publish and Evangelize Experimental Findings 324 Setting Experiments Up for Success 324 Designing a Hypothesis 326 Designing an Experiment 328 Experiment Design Specifications 330 Conducting Experiments 335 Collecting Evidence 335 Analyzing and Documenting Evidence 337 Capturing Knowledge for Feedback Loops 338 Document Experiment Release Notes 339 Automating Experiments 340 Easing into Chaos: Game Days 341 Example Security Chaos Experiments 341 Security Chaos Experiments for Production Infrastructure 342 Security Chaos Experiments for Build Pipelines 344 Security Chaos Experiments in Cloud Native Environments 346 Security Chaos Experiments in Windows Environments 348 Chapter Takeaways 350 9. Security Chaos Engineering in the Wild. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Experience Report: The Existence of Order Through Chaos (UnitedHealth Group) 353 The Story of ChaoSlingr 355 Step-by-Step Example: PortSlingr 357 Experience Report: A Quest for Stronger Reliability (Verizon) 359 The Bigger They Are… 361 All Hands on Deck Means No Hands on the Helm 362 Assert Your Hypothesis 363 Reliability Experiments 363 Table of Contents | ix
Cost Experiments 366 Performance Experiments 367 Risk Experiments 368 More Traditionally Known Experiments 369 Changing the Paradigm to Continuous 370 Lessons Learned 371 Experience Report: Security Monitoring (OpenDoor) 371 Experience Report: Applied Security (Cardinal Health) 372 Building the SCE Culture 373 The Mission of Applied Security 373 The Method: Continuous Verification and Validation (CVV) 374 The CVV Process Includes Four Steps 374 Experience Report: Balancing Reliability and Security via SCE (Accenture Global) 376 Our Roadmap to SCE Enterprise Capability 377 Our Process for Adoption 378 Experience Report: Cyber Chaos Engineering (Capital One) 381 What Does All This Have to Do with SCE? 381 What Is Secure Today May Not Be Secure Tomorrow 382 How We Started 382 How We Did This in Ye Olden Days 383 Things I’ve Learned Along the Way 386 A Reduction of Guesswork 387 Driving Value 387 Conclusion 387 Chapter Takeaways 388 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 x | Table of Contents
Preface In Life’s wave, in action’s storm, I float, up and down, I blow, to and fro! Birth and the tomb, An eternal flow, A woven changing, A glow of Being. Over Time’s quivering loom intent, Working the Godhead’s living garment. — Faust If you’ve worked with computers in a professional setting at any point since the dawn of the millennia, you’ve probably heard that security is important. By now, you’ve also probably realized cybersecurity is broken. Humans are entrusting us, as people work‐ ing with software, with more and more of their lives and we are failing to keep that trust. Year after year, the same sorts of attacks ravage the coasts and heartlands of our ever-growing digital territories. Meanwhile, the security industry accumulates power and money, indulging in newer, shinier technology and oft deepening their sanctimonious effrontery. Success out‐ comes remain amorphous; and in the background slinks an existential dread that security can’t “keep up” with software. Fingers point and other fingers point back. Our security programs coagulate into performative rituals—a modern humoralism based in folk wisdom and tradition rather than empiricism. Software engineering teams simmer in resentment, yearning for answers on how to keep their systems safe without requiring ritual sacrifices. We know we can do better, but we struggle to chart a course when immersed in the murky obliqueness that is cybersecurity today. xi
A fundamental shift in both philosophy and practice is nigh. Cybersecurity must embrace the reality that failure will happen. Humans will click on things and sometimes it will be the wrong thing. The security implications of simple code changes won’t be clear to developers. Mitigations will accidentally be disabled. Things will break (and are, in fact, breaking all the time). This shift requires us to transform toward resilience—the ability to recover from failure and adapt as our context changes. This book is an attack on current cybersecurity strategy and execution. To evoke author and activist Jane Jacobs, this attack is on the principles and aims that have shaped traditional cybersecurity strategy and execution, not quibbles about specific methods or design patterns. We call this transformation “Security Chaos Engineering,” the subject of this tome. Security Chaos Engineering (SCE) is a sociotechnical transformation that drives value to organizations through an ability to respond to failure and adapt to evolving condi‐ tions with speed and grace. To set this transformation up for success, we will draw on interdisciplinary insights regarding the resilience of complex systems. In essence, we can copy the homework of other problem domains to make solving our computer and software problems clearer. We, as humanity, have gotten quite good at software. Most software is now part of distributed systems. Transformational technology shifts such as cloud computing, microservices, and continuous delivery have each flowered enhancements in cus‐ tomer value but, in turn, have effloresced a new series of challenges too. Primary among those challenges is that we’ve reached a state where the systems we build have become impossible for our minds to conceive in totality. What starts as our little soft‐ ware seedling grows to something astonishing and impossible to mentally model, should it succeed. Throughout this book, we cover the philosophies, practices, and principles that will help you achieve outcome-driven security and transform toward resilience. We will discuss far more than just the art of conducting chaos experiments—although the sci‐ entific method is essential for our quest. SCE is our ambitious extension of the prac‐ tice of chaos engineering, which began with the goal of promoting systems resilience from a performance perspective rather than a security perspective. Chaos engineering is the practice of continual experimentation to validate that our systems operate the way we believe they do. These experiments help uncover systemic weaknesses or gaps in our understanding, informing improved design and practices that can help the organization gain more confidence in their behavior. The aim of chaos engineering is to fix things in production, not break them. There is little point to suffering, in any domain, if we cannot learn from it and devise a con‐ structive course for improvement. Throughout the book, we’ve taken great care to avoid any analogies related to warfare or violence—the kind you usually find in xii | Preface
cybersecurity and that insinuate that militaristic, disciplinary solutions are needed to solve our problems. Instead, we hope to inspire the industry with analogies from nature, ecology, and other domains that involve nurturing, nourishing, and support‐ ing humans in achieving their goal outcomes. We can be creators rather than castiga‐ tors. We can succeed in our security aims without dehumanizing users, treating them like objects to control lest they thwart our zealous ambitions. By the end of the book, you—no matter your role—will understand how to sustain resilience in your software and systems so your organization can thrive despite the presence of attackers. You will learn how to adapt to adversity and maintain continu‐ ous change as the world evolves around you and your systems. You’ll discover that security can escape the dark ages and enter the enlightenment era by embracing empiricism and experimentation. We hope this ignites a meta-transformation away from a security status quo—one that served us before we learned better—toward resilience so we can, at last, outmaneuver attackers and get real stuff done. Who Should Read This Book? If your responsibility is to design, develop, build, deploy, deliver, operate, recover, manage, protect, or secure systems that include software, then this book is for you. This book is for humans involved in software and systems engineering across titles and focal areas—software engineers, software architects, security engineers, and secu‐ rity architects; site reliability engineers; platform engineering teams and their leaders; infrastructure, cloud, or DevOps engineers and the directors and VPs of those teams; CTOs, CIOs, and CISOs; and, of course, students who aspire to leave an indelible mark through their work, making the software footprint of humanity better in any way they can. This book is especially relevant if your software, services, and systems are complex— which is most software, services, and systems that are internet-connected and the byproduct of many minds over many years. No matter where you sit in the software delivery lifecycle—or outside of it, as an administrator, manager, or defender—this book offers you wisdom on how to support your systems’ resilience to attack and other adverse conditions from your sphere of influence. You should have a basic understanding of what software is and how organizations use it. Some practical experience either designing, delivering, or operating software sys‐ tems or else implementing a security program is helpful—but we recognize that few people possess experience in both. This book is explicitly designed to teach software people about security and security people about software while extending and enrich‐ ing existing experts’ knowledge too. Preface | xiii
If any of the following outcomes compel you, then you’ll find this book valuable: • Learn how to design a modern security program. • Make informed decisions at each phase of software delivery to nurture resilience and adaptive capacity. • Understand the complex systems dynamics upon which resilience outcomes depend. • Navigate technical and organizational trade-offs that distort decision making in systems. • Explore chaos experimentation to verify critical assumptions about software quality and security. • Learn how major enterprises leverage security chaos engineering. As we’ll emphasize, and reemphasize, your strategy for nourishing your systems’ resilience to attack depends on your specific context. Every organization, no matter the size, age, or industry, can benefit from investing in resilience via the SCE transfor‐ mation we’ll describe in these pages. This book is explicitly not written only for hyperscalers and Fortune 100 organizations; the content is simply too valuable. Scope of This Book This book does not prescribe specific technologies nor does it detail instructions on how to implement the opportunities described in code. We encourage you to peruse relevant documentation for such details and to exercise the unique skills you bring to your organization. Our goal is to discuss the principles, practices, and trade-offs that matter when we consider systems resilience, offering you a cornucopia of opportuni‐ ties across your software activities from which you can pluck the patterns you feel will most likely bear fruit for your organization. Outline of This Book We begin our journey in Chapter 1, “Resilience in Software and Systems”, by discussing resilience in complex systems, how failure manifests, how resilience is maintained, and how we can avoid common myths that lead our security strategy astray. In Chapter 2, “Systems-Oriented Security”, we explore the needed shift toward sys‐ tems thinking in security, describing how to refine mental models of systems behav‐ ior and perform resilience assessments before comparing SCE to traditional cybersecurity (“security theater”). The structure for Chapters 3 to 6 acts as a reference guide you can pull out at each stage of software delivery. Chapter 3, “Architecting and Designing”, starts in the “first” phase of software delivery: architecting and designing systems. We think xiv | Preface
through how to invest effort based on your organization’s specific context before describing opportunities to invest in looser coupling and linearity. In Chapter 4, “Building and Delivering”, we map the five features that define resil‐ ience to activities we can pursue when developing, building, testing, and delivering systems. The ground we cover is expansive, from code reviews, standardization of “raw materials,” automating security checks, and Configuration as Code, to test thea‐ ter, type systems, modularity, and so much more (this chapter is perhaps the most packed full of practical wisdom). Chapter 5, “Operating and Observing”, describes how we can sustain resilience as our systems run in production—and as we operate and observe our systems. We reveal the overlap between site reliability engineering (SRE) and security goals, then dis‐ cover different strategies for security observability before closing with a discussion of scalability’s relevance to security. In Chapter 6, “Responding and Recovering”, we move on to what happens after an incident, digging into the biases that can distort our decision making and learning dur‐ ing this phase—including action bias, hindsight bias, outcome bias, and the just-world hypothesis. Along the way, we propose tactics for countering those biases and support‐ ing more constructive efforts, particularly with an eye to eradicating the especially unproductive blame game of declaring “human error” as the “root cause” of incidents. Chapter 7, “Platform Resilience Engineering”, introduces the concept of platform resil‐ ience engineering and describes how to implement it in practice within any organiza‐ tion. We cover the process for creating security solutions for internal customers (like engineering teams), including defining a vision, defining a user problem, designing a solution, and implementing a solution. The Ice Cream Cone Hierarchy of Security Sol‐ utions, which we cover in this chapter, is especially tasty (and practical) wisdom. In Chapter 8, “Security Chaos Experiments”, we learn how to conduct experiments and paint a richer picture of our systems, which in turn helps us better navigate strategies to make them more resilient to failure. We outline the end-to-end experi‐ mentation process: how to set your experiments up for success; designing hypotheses; designing experiments and writing experiment specifications; conducting experi‐ ments and collecting evidence; and analyzing and documenting evidence. Finally, in Chapter 9, “Security Chaos Engineering in the Wild”, we learn from chaos experiments conducted in the wild. Real organizations that have adopted SCE and have conducted chaos experiments generously impart their wisdom through a series of case studies. We’ll learn from UnitedHealth Group, Verizon, OpenDoor, Cardinal Health, Accenture Global, and Capital One. Preface | xv
Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, and keywords. This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. xvi | Preface
How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-889-8969 (in the United States or Canada) 707-829-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://www.oreilly.com/about/contact.html We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/SecurityChaosEngineering. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media. Follow us on Twitter: https://twitter.com/oreillymedia. Watch us on YouTube: https://youtube.com/oreillymedia. Acknowledgments From Aaron: We would like to acknowledge the SCE community contributions made to this body of work by the following individuals. These early pioneers of Security Chaos Engineering have helped shape the community and the craft that exists today. Thank you so much for all the sacrifices and support! • Brian Bagdzinski • Jamie Dicken • Rob Duhart, Jr. • Troy Koss • Matas Kulkovas • David Lavezzo • Omar Marrero • Charles Nwatu • Mario Platt • Kennedy Torkura • Dan Walsh • Jerome Walters • James Wickett • Sounil Yu From Kelly: This book weaves together threads of thought across countless disci‐ plines, scholars, and practitioners. I lost count of the papers, blog posts, books, and conversations I inhaled before and during the course of writing this book—and this is reflected in the references through the book, of which there are many. It is especially Preface | xvii
through insights from other problem domains that we can break free from traditional cybersecurity’s stifling insularity; there is much we can learn from others’ mistakes, stumbles, and successes, even if their quests did not involve computers. In truth, it would be challenging to enumerate each source of inspiration in the book, especially those lacking a traditional citation. If we’ve ever challenged the status quo wisdom of systems security together in fervent conversation, I extend my gratitude— including Allan Alford, Geoff Belknap, Juan Pablo Buriticá, Lita Cho, Erinn Clark, Tasha Drew, Rob Duhart, Jr., Thomas Dullien, Dr. Josiah Dykstra, Camille Fournier, Dr. Nicole Forsgren, Jessie Frazelle, Eitan Goldstein, Bea Hughes, Kyle Kingsbury, Julia Knecht, Toby Kohlenberg, Mike Leibowitz, Kelly Lum, Caitie McCaffrey, Fer‐ nando Montenegro, Christina Morillo, Renee Orser, Ryan Petrich, Greg Poirier, Alex Rasmussen, Dr. Andrew Ruef, Snare, Inés Sombra, Jason Strange, James Turnbull, Phil Vachon, and Sounil Yu. I’m especially grateful to the entire Rantifesto crew (you know who you are) and Doctor Animal for their computer wisdom—may I continue to learn from you and throw shade with you. A few chosen humans, our technical reviewers, were especially valuable in shaping this book. Thank you for devouring all this material in such a short period of time, providing constructive feedback and, in some places, proffering inspiration for new content. This book is stronger as a direct result of your efforts: Juan Pablo Buriticá, Will Gallego, Bea Hughes, Ryan Petrich, Alex Rasmussen, and Jason Strange. To Aaron Rinehart, my unindicted co-conspirator in promulgating the resilience transformation, I am forever indebted that you DM’d me asking if I wanted to work on a book with you. For her endless patience with my frenetic writing patterns, strange requests, and oft overwrought literary references, I am grateful to our editor, Michele Cronin. To our production editor, Clare Laylock, and copyeditor, Nicole Taché, I am appreciative of their tireless efforts in ensuring a speedy, smooth production process—and their patience with my perfectionism. And thanks to the team at O’Reilly and John Devins for allowing all these words to exist outside the confines of my brain. Special thanks are due to Savannah Glitschka, who brought Chaos Kitty to life and infused each chapter with magical illustrations. And for giving me the space-time to devote to writing, supporting my challenge of the security status quo, and being a thoughtful teacher, I thank Sean Leach. xviii | Preface
The above is a preview of the first 20 pages. Register to read the complete e-book.