Statistics
1
Views
0
Downloads
0
Donations
Support
Share
Uploader

高宏飞

Shared on 2026-03-26

AuthorMandeep Ubhi

DESCRIPTION Chaos Engineering with Python is a comprehensive guide to designing, executing, and automating chaos experiments to build resilient systems. The book blends foundational theory with hands-on practice, ensuring readers gain an understanding of implementing chaos engineering effectively. It begins by defining resilience and tracing the evolution of chaos engineering from traditional testing methods. A core focus of the book is real-world application, demonstrating structured chaos experiments across various environments. Readers will learn fault injection techniques, how to analyze experiment results, and how to use tools like the Python Chaos Toolkit. The book extensively covers chaos engineering on Kubernetes, a critical skill for modern cloud-native applications, and explores experiments on virtual machines and AWS infrastructure, in addition to providing an overview of the managed chaos services. The book also emphasizes integrating chaos experiments into CI/CD pipelines, enabling automated, continuous resilience testing as part of the development workflow. Beyond tech, it provides guidance on embedding, measuring, and sustaining the cultural shift needed to embrace chaos engineering. This book is a valuable resource for anyone looking to understand and implement chaos engineering, from beginners to experienced practitioners, providing both the technical know-how and the cultural understanding necessary for building truly resilient systems. KEY FEATURES ● Progressive learning, starting with chaos principles, fault tolerance patterns to practical use cases, and finally to integration with CI/CD. ● Real-world use cases, including Kubernetes and cloud infrastructure, built with a structured and repeatable approach. ● Proven methods to embed chaos engineering culture in the enterprise and highlight its business value to stakeholders. WHAT YOU WILL LEARN ● Understand chaos engineering principles, resilience, and proactive failure testing. ● Implement chaos

Tags
No tags
ISBN: 9365892074
Publisher: BPB Publications
Publish Year: 2025
Language: 英文
Pages: 233
File Format: PDF
File Size: 3.2 MB
Support Statistics
¥.00 · 0times
Text Preview (First 20 pages)
Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

(This page has no text content)
OceanofPDF.com
Chaos Engineering with Python Python-powered techniques for building resilent systems through chaos engineering Mandeep Ubhi www.bpbonline.com OceanofPDF.com
First Edition 2025 Copyright © BPB Publications, India ISBN: 978-93-65892-079 All Rights Reserved. No part of this publication may be reproduced, distributed or transmitted in any form or by any means or stored in a database or retrieval system, without the prior written permission of the publisher with the exception to the program listings which may be entered, stored and executed in a computer system, but they can not be reproduced by the means of publication, photocopy, recording, or by any electronic and mechanical means. LIMITS OF LIABILITY AND DISCLAIMER OF WARRANTY The information contained in this book is true to correct and the best of author’s and publisher’s knowledge. The author has made every effort to ensure the accuracy of these publications, but publisher cannot be held responsible for any loss or damage arising from any information in this book. All trademarks referred to in the book are acknowledged as properties of their respective owners but BPB Publications cannot guarantee the accuracy of this information. www.bpbonline.com OceanofPDF.com
Dedicated to My parents whose departed souls continue to be a source of my inspiration. My wife and son for being my moral compass and inspiring me to build a legacy that is worthy of their love and support. My brother and sister for being a steadfast presence, I can always rely on, even from afar. OceanofPDF.com
About the Author Mandeep Ubhi is a passionate technologist whose career spans over two decades. He worked on his first commercial software project in the year 2000 and published a series of technical articles on .NET in “i.t. magazine” in 2003. Since then, Mandeep has been at the forefront of several technology roles in the Fortune 500 companies, developing, operating, and transforming products and services at scale. Mandeep is passionate about automation and innovation and takes pride in having built autonomous systems that reduce the need for human intervention for monotonous tasks and enhance operational excellence. He has coached numerous teams and individuals on the path to DevOps and Site Reliability Engineering. Mandeep is a highly sought-after thought leader in the DevOps and SRE domains and has been featured in tens of international conferences as a speaker and panelist. Mandeep is also the founder of DevSRE, a transformative shift combining the DevOps and SRE paradigms. DevSRE provides structured SRE implementation and maturity models, which makes it easier to learn, implement, and mature SRE practices. Outside of his professional life, Mandeep enjoys music and practices healthy lifestyle choices like yoga and meditation. Mandeep is also a follower of deep philosophical topics and often spends time reading ancient and modern philosophers. OceanofPDF.com
About the Reviewers ❖ Chetan Sachdeva is a tech enthusiast with over 15 years of expertise in C++, Rust and Python in software development. With a dynamic career spanning Automotive, AR/VR, IoT, Android native development, Typography, and Printing RIP. His true passion lies in solving complex problems with data structures and algorithms. His entrepreneurial spirit has led to the successful design of software and products for startups. Beyond his professional endeavors, Chetan is an avid reader of non-fiction and IT-related books, staying at the forefront of industry trends. ❖ Jasbeer Singh is a passionate Software Engineer and a cloud practitioner with extensive professional experience in the domains of Python development, testing methodologies, system automation, CI/CD, DevOps practices, data warehousing, and framework development. He is proficient in various tools and technologies related to these domains. Jasbeer specializes in building robust and efficient software systems and loves exploring new technologies and sharing his insights. He has reviewed this book, bringing his practical experience and deep understanding of software development best practices to his assessment of the material. OceanofPDF.com
Acknowledgement This book is the culmination of countless hours of work, thought, and collaboration to produce a precise and practical handbook on chaos engineering with Python. It would not have been possible without the support, encouragement, and inspiration of many incredible people. To my family, thank you for being my unwavering foundation. Your love and patience have been the bedrock of my journey. Your support, even during the most difficult moments of writing and researching, has been my greatest source of strength. To BPB Publications, I am deeply grateful for your trust in my capabilities and for your incredible support and patience while I juggled through my professional commitments. Your guidance and meticulous reviews have made this book more polished, insightful, and accessible. To the countless technology hustlers whose stories inspire me every day— you are the heartbeat of innovation. Your relentless drive to push boundaries and create meaningful solutions is what fuels my passion to do better, be better, and give back to the community. You remind me that chaos is not the enemy but an opportunity to grow, learn, and build systems that we can all be proud of. This book is as much yours as it is mine. Thank you for being a part of this journey and for inspiring me to contribute to a community that values resilience, scalability, and operational excellence. OceanofPDF.com
Preface We live in a world where our lives are intricately woven into the fabric of the Internet, technology, and the systems that power them. A brief disconnection from the Internet and our daily routines seem to grind to a halt. But what keeps these systems running seamlessly, almost all the time? It is not by chance. It is the result of resilient engineering practices, one of which is chaos engineering. This book discusses chaos engineering and offers practical ways to apply it to real-world scenarios, such as cloud infrastructure, virtualized applications, Kubernetes workloads, and even integrating with CI/CD pipelines. It also demonstrates how chaos engineering practices can be seamlessly aligned with existing DevOps and Site Reliability Engineering (SRE) methodologies widely adopted by organizations today. To cater to a diverse audience, the book adopts a progressive approach— starting with foundational concepts and gradually advancing to complex implementations. Whether you are just beginning your journey with DevOps/SRE, or you are a seasoned professional, the book includes accessible code examples and practical guidance. The book intentionally includes redundancy in a few topics to emphasize the significance of repeatability and consistency. These traits are essential for any practice to be easily adaptable and automatable. By reinforcing key principles, the book aims to make these approaches second nature for practitioners. The topics and use cases discussed in this book are designed to intersect with your professional work, offering insights and strategies that will hopefully prove invaluable to your chaos engineering journey. Chapter 1: Resilience and Evolution of Chaos Engineering - The term resilience can often take many forms. So, this chapter defines what
resilience really means in the context of IT systems. Traditional software testing methods, while effective in many respects, often fall short in addressing the complexities of production workloads running on cloud and distributed systems. This gap is precisely where chaos engineering shines, offering organizations a practical and innovative approach to testing and fortifying their systems against unexpected failures. This chapter introduces readers to the transformative power of chaos engineering—a methodology born at Netflix and now widely adopted across industries. Chaos engineering moves beyond conventional testing, employing real-world experiments to predict system behavior, test failure scenarios directly in production, and ensure graceful degradation during outages. It emphasizes minimizing the impact of tests through careful planning and blast radius control while leveraging automation to make experimentation efficient and repeatable. By the end of this chapter, readers will grasp why chaos engineering is not just a trend but a necessity for modern IT operations. They will see how its principles align with the broader goals of ensuring resilience and reducing downtime, ultimately safeguarding user experience and business continuity. This chapter is essential for anyone eager to explore the evolution of cutting-edge practices that drive dependable, high- performing systems in an unpredictable digital landscape. Chapter 2: Rapid Refresher on Python Essentials - For readers new to Python, this chapter serves as a comprehensive foundation. It begins by introducing Python as an accessible yet powerful programming language, detailing how to install and configure it, including setting up virtual environments for isolated and efficient development. The chapter systematically covers Python's core concepts—variables, data types, control structures, data structures, file handling, and object-oriented programming. It also explores modules, packages, regular expressions, and the rich ecosystem of Python’s standard and third-party libraries, equipping readers with versatile tools for solving real-world challenges. Since error handling and debugging techniques are essential for any development work, the chapter emphasizes relevant techniques ensuring readers can write resilient scripts capable of gracefully managing unexpected scenarios. By the end of this chapter, readers will have the necessary refresher on Python knowledge to design and execute complex chaos experiments, laying a strong foundation for building resilient systems.
Chapter 3: Implementation Journey of Chaos Experiments - This chapter provides a practical journey into implementing chaos experiments. Using an e-commerce application as a real-world example, the chapter illustrates how chaos engineering principles and workflows are applied to design and execute meaningful experiments. The chapter also explains the anatomy of chaos experiments, with Python scripts as the backbone for injecting faults and testing system resilience. The chapter delves into fault injection techniques, fault tolerance patterns, and the interpretation of experiment results. Key metrics, system behavior, and hypotheses are analyzed to assess how well the system responds to simulated failures. Additionally, the chapter emphasizes the vital role of monitoring and explains how the technical aspects of chaos engineering can be monitored in real production environments, especially in cloud-native and distributed systems. By the end of this chapter, readers will have an understanding of how to plan, execute, and evaluate chaos experiments, equipping them with the skills to strengthen the reliability and robustness of their systems in real- world scenarios. Chapter 4: Up and Running with Python Chaos Toolkit - This chapter introduces the Chaos Toolkit, a Python-based solution for designing and executing chaos experiments. Readers are guided through the features, configuration, and practical usage of the Chaos Toolkit, using a sample Python application as a hands-on example. The chapter demonstrates how the toolkit simplifies and automates the process of planning, running, and analyzing chaos experiments, making it an essential tool for scaling resilience efforts. In addition to the Chaos Toolkit, the chapter explores other available toolkits, each suited for specific scenarios and contexts, providing readers with a broader perspective on the tools available for chaos engineering. Furthermore, it highlights the managed services offered by major cloud providers, showcasing how these services can be leveraged to implement chaos engineering practices for cloud-native workloads. Chapter 5: Chaos Experiments on Virtual Machines - This chapter provides a practical guide to leveraging the Chaos Engineering Toolkit for common use cases in virtualized environments, with a focus on VMware. Building on the foundational principles and workflows discussed in earlier chapters, the chapter emphasizes the importance of establishing clear business use cases, setting the right context, and following a structured
chaos engineering workflow to ensure successful implementations. Readers are guided through environment setup, including the creation of virtual environments, the installation of the Chaos Toolkit, including relevant Python libraries, and programmatically connecting to VMware environments. The chapter showcases the use of both the Python Chaos Toolkit and custom Python scripts to interact with virtualized environments and execute chaos experiments. In addition to hands-on implementations, the chapter highlights best practices for conducting effective chaos experiments, such as starting with small, controlled tests, maintaining clear documentation, and adhering to robust security practices. By the end of the chapter, readers will have a comprehensive understanding of how to apply chaos engineering principles to virtualized workloads, identifying vulnerabilities and strengthening the resilience of their VMware environments. Chapter 6: Chaos Experiments with Kubernetes - This chapter sets the stage for understanding and implementing chaos engineering in Kubernetes environments. Focusing on practical applications, it introduces key concepts and workflows essential for building resilient, cloud-native Kubernetes applications. Readers are guided through the deployment of a microservices application on Google Kubernetes Engine (GKE) and the configuration of the Chaos Toolkit, both locally and as a Kubernetes operator, to run chaos experiments. Emphasizing a structured approach, the chapter outlines the importance of steady-state hypotheses, precise experiment design, and detailed analysis. It walks through foundational steps, including setting up monitoring and uptime checks, executing experiments such as pod deletions, and evaluating system recovery and resilience. Additionally, it addresses strategies for remediating identified weaknesses and improving overall robustness. By aligning chaos experiments with Service Level Objectives (SLOs) and error budgets, the chapter also provides a framework for balancing rapid delivery with system reliability. By embedding chaos engineering into routine workflows, the chapter demonstrates how organizations can proactively uncover vulnerabilities, enhance system performance, and deliver a more consistent and reliable user experience for Kubernetes workloads. By the end of the chapter, readers will gain the essential understanding of applying chaos engineering principles to Kubernetes workloads and further enhance their resilience.
Chapter 7: Chaos Experiments with Infrastructure - This chapter delves into the practical implementation of chaos engineering in IT infrastructure, with a focus on cloud environments. Since Chapter 6 used Google Cloud Platform (GCP) as an example, this chapter uses Amazon Web Services (AWS) as an example to provide a broader perspective to the readers. The chapter showcases the application of the Python Chaos Toolkit and its AWS extension to conduct chaos experiments targeting EC2 instances. The chapter also introduces readers to managed chaos engineering services provided by AWS and Azure, emphasizing their integration within their respective ecosystems. These services offer a streamlined, low-effort approach to running experiments, though they come with associated costs. Additionally, the chapter explores how Python’s extensive library of binaries enables developers to encapsulate and leverage managed services within custom programs, providing flexibility and extending the functionality of chaos engineering workflows. By the end of this chapter, readers will have a comprehensive understanding of applying chaos engineering principles to cloud infrastructure, weighing the benefits and trade-offs of managed services versus custom implementations. This knowledge equips readers to identify and address vulnerabilities in their IT environments, enhancing system resilience and reliability. Chapter 8: Integrating Chaos Experiments into CI/CD Pipelines - This chapter provides a comprehensive guide to embedding chaos engineering into CI/CD pipelines, emphasizing the importance of automation in building resilient and fault-tolerant systems. Beginning with a refresher on the fundamentals of CI/CD, the chapter underscores its pivotal role in enabling a repetitive and “shift-left” approach to rapid and reliable software delivery. The chapter then explores the strategic and practical integration of chaos experiments into CI/CD workflows, demonstrating how these experiments proactively identify weaknesses and enhance system stability. A detailed examination of Jenkins highlights its flexibility as a CI/CD tool, showcasing its ability to orchestrate chaos experiments alongside traditional build and deployment tasks. By aligning chaos experiments with infrastructure-as-code principles and embedding them into the software delivery lifecycle, the chapter emphasizes the importance of collaboration, improved observability, and proactive incident prevention. By the end of this chapter, readers will understand how integrating chaos engineering into
CI/CD pipelines applies the principle of automation in chaos engineering in a practical way to enable repeatability and consistency. Chapter 9: Embedding Chaos Engineering Culture - This chapter explores the foundational role of culture in chaos engineering, emphasizing its impact as a catalyst for thriving in uncertainty. Moving beyond tools and techniques, it delves into the mindset shift required to prioritize resilience, experimentation, and continuous improvement in modern software systems. The chapter highlights the significance of culture in empowering teams, fostering autonomy, and encouraging accountability. It contrasts a proactive chaos engineering culture with the pitfalls of reactive firefighting, such as technical debt and team burnout. By embedding regular experimentation and data-driven decision-making into DevOps and SRE practices, a chaos engineering culture transforms resilience into a shared responsibility. Leadership plays a pivotal role in championing this cultural shift, driving collaboration, innovation, and ownership within dynamic, forward-thinking environments. The chapter concludes by acknowledging the challenges of adopting a chaos engineering culture while celebrating the long-term rewards: resilient systems, empowered teams, and a future-ready organization equipped to navigate uncertainty. OceanofPDF.com
Code Bundle and Coloured Images Please follow the link to download the Code Bundle and the Coloured Images of the book: https://rebrand.ly/ebc4cc The code bundle for the book is also hosted on GitHub at https://github.com/bpbpublications/Chaos-Engineering-with-Python. In case there’s an update to the code, it will be updated on the existing GitHub repository. We have code bundles from our rich catalogue of books and videos available at https://github.com/bpbpublications. Check them out! Errata We take immense pride in our work at BPB Publications and follow best practices to ensure the accuracy of our content to provide with an indulging reading experience to our subscribers. Our readers are our mirrors, and we use their inputs to reflect and improve upon human errors, if any, that may have occurred during the publishing processes involved. To let us maintain the quality and help us reach out to any readers who might be having difficulties due to any unforeseen errors, please write to us at : errata@bpbonline.com Your support, suggestions and feedbacks are highly appreciated by the BPB Publications’ Family. Did you know that BPB offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.bpbonline.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at : business@bpbonline.com for more details.
At www.bpbonline.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on BPB books and eBooks. Piracy If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at business@bpbonline.com with a link to the material. If you are interested in becoming an author If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit www.bpbonline.com. We have worked with thousands of developers and tech professionals, just like you, to help them share their insights with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea. Reviews Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions. We at BPB can understand what you think about our products, and our authors can see your feedback on their book. Thank you! For more information about BPB, please visit www.bpbonline.com. Join our book’s Discord space Join the book’s Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors: https://discord.bpbonline.com OceanofPDF.com
Table of Contents 1. Resilience and Evolution of Chaos Engineering Introduction Structure Objectives Importance of resilience Overview of cloud systems Limitations of software testing Overcoming limitations of software testing Load testing versus chaos engineering Measuring resilience Principles of chaos engineering Hypothesizing the system behavior and steady state Introduce varying real-world events Target production environment Automate chaos experiments Role of automation and Python Industry perspective Conclusion Multiple choice questions Answers 2. Rapid Refresher on Python Essentials Introduction
Structure Objectives Python overview Install and configure Python Variables and data types Control structures Data structures File input and output Object oriented programming Modules and packages Modules Packages Regular expressions Standard libraries Third-party libraries Exception handling Debugging techniques Conclusion 3. Implementation Journey of Chaos Experiments Introduction Structure Objectives Use case and the sample application Implementation journey Anatomy of chaos experiments Fault injection techniques Fault tolerance patterns Best practices
Conclusion 4. Up and Running with Python Chaos Toolkit Introduction Structure Objectives Chaos toolkit in Python Chaos Toolkits in other languages Cloud native Chaos Toolkits Conclusion 5. Chaos Experiments on Virtual Machines Introduction Structure Objectives Common use cases Environment setup Virtual machines CPU contention Set the context Follow the chaos engineering workflow Disk inaccessibility Set the context Follow the chaos engineering workflow Industry best practices Conclusion 6. Chaos Experiments with Kubernetes Introduction Structure
Objectives Refresher on Kubernetes Use cases Chaos engineering tools for Kubernetes Python Chaos Toolkit Environment setup Run chaos experiments Setting the context Following the chaos engineering workflow Optimize monitoring Python script Integration with DevOps and Site Reliability Engineering practices Conclusion 7. Chaos Experiments with Infrastructure Introduction Structure Objectives Primer on IT infrastructure Compute Networks Storage Use cases Chaos engineering tools for infrastructure Python chaos toolkit Environment setup for Amazon Web Services Set the context Following the chaos engineering workflow Optimize monitoring Cloud-native solutions