Managing Cloud Native Data on Kubernetes Architecting Cloud Native Data Services Using Open Source Technology (Jeff Carpenter, Patrick McFadin) (Z-Library)

Author: Jeff Carpenter, Patrick McFadin

技术

Is Kubernetes ready for stateful workloads? This open source system has become the primary platform for deploying and managing cloud native applications. But because it was originally designed for stateless workloads, working with data on Kubernetes has been challenging. If you want to avoid the inefficiencies and duplicative costs of having separate infrastructure for applications and data, this practical guide can help. Using Kubernetes as your platform, you’ll learn open source technologies that are designed and built for the cloud. Authors Jeff Carpenter and Patrick McFadin provide case studies to help you explore new use cases and avoid the pitfalls others have faced. You’ll get an insider’s view of what’s coming from innovators who are creating next-generation architectures and infrastructure. With this book, you will: Learn how to use basic Kubernetes resources to compose data infrastructure Automate the deployment and operations of data infrastructure on Kubernetes using tools like Helm and operators Evaluate and select data infrastructure technologies for use in your applications Integrate data infrastructure technologies into your overall stack Explore emerging technologies that will enhance your Kubernetes-based applications in the future

📄 File Format: PDF
💾 File Size: 8.2 MB
32
Views
0
Downloads
0.00
Total Donations

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

📄 Page 1
(This page has no text content)
📄 Page 2
(This page has no text content)
📄 Page 3
Praise for Managing Cloud Native Data on Kubernetes This book challenged my notions about storing data on Kubernetes. I no longer fear the loss of data. —Jesse Anderson, Managing Director, Big Data Institute Managing Cloud Native Data on Kubernetes is a groundbreaking work not only because it is the first to tackle this problem space, but because it simultaneously obviates the need for any other book on the subject. Drawing on their decades of experience, Jeff and Patrick give readers the confidence to run stateful workloads on Kubernetes in production. This book will be the reference on the topic for years to come. —Umair Mufti, Director of Product Management, Portworx by Pure Storage Kubernetes is notoriously complex, and dealing with persistent data adds to the complexity. This book does an amazing job of taming the complexity of dealing with data using Kubernetes with many useful code examples and architectural diagrams. —Noah Gift, Duke Executive in Residence Storage is one of the hardest infrastructure layers to master and arguably has the longest innovation cycles. We are at the cusp of one such innovation cycle at the moment with cloud native applications. Jeff and Patrick have tackled this subject head-on, by having the readers understand the evolution of cloud native storage and help transform theirstorage strategy to meet the next gen application demands. Anyone that is working with microservices (which is almost everyone at the moment), must read this book before they have completed their transformation projects. —Kiran Mova, Founder, Architect Storage Startups Open Source Advocate/Manager, VMware
📄 Page 4
I have learned a lot from reading this book! I have been working full time in the Kubernetes ecosystem for several years at Red Hat but this book touches areas that I haven’t had experience with. It was an eye-opener for me to realize that Kubernetes is not only for stateless microservices. I can clearly see where the platform is going and this book definitely helped me see that direction. Industry experts Jeff Carpenter and Patrick McFadin put some very nice articles from other experts in the book and I loved reading how tech evolved into its current state. —Ali Ok, Principal Software Engineer, Red Hat This is the book you need if doing persistence on Kubernetes is your ultimate goal. Jeff and Patrick do a tremendous job in this comprehensive view of Data on Kubernetes to the point where it doesn’t have to be scary, especially if you have this book on your shelf! —Rick Vasquez, Senior Director, Strategic Initiatives, Western Digital
📄 Page 5
Jeff Carpenter and Patrick McFadin Managing Cloud Native Data on Kubernetes Architecting Cloud Native Data Services Using Open Source Technology Boston Farnham Sebastopol TokyoBeijing
📄 Page 6
978-1-098-11139-7 [LSI] Managing Cloud Native Data on Kubernetes by Jeff Carpenter and Patrick McFadin Copyright 2023 Jeffrey Carpenter and Patrick McFadin. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Aaron Black Development Editor: Jill Leonard Production Editor: Beth Kelly Copyeditor: Justin Billing Proofreader: Sharon Wilkey Indexer: Potomac Indexing, LLC Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea December 2022: First Edition Revision History for the First Edition 2022-12-01: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098111397 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Managing Cloud Native Data on Kubernetes, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors, and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. This work is part of a collaboration between O’Reilly and Portworx by Pure Storage. See our statement of editorial independence.
📄 Page 7
Table of Contents Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv 1. Introduction to Cloud Native Data Infrastructure: Persistence, Streaming, and Batch Analytics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Infrastructure Types 2 What Is Cloud Native Data? 4 More Infrastructure, More Problems 6 Kubernetes Leading the Way 7 Managing Compute on Kubernetes 8 Managing Network on Kubernetes 9 Managing Storage on Kubernetes 9 Cloud Native Data Components 10 Looking Forward 11 Getting Ready for the Revolution 12 Adopt an SRE Mindset 12 Embrace Distributed Computing 14 Principles of Cloud Native Data Infrastructure 14 Summary 17 2. Managing Data Storage on Kubernetes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Docker, Containers, and State 19 Managing State in Docker 21 Bind Mounts 21 Volumes 22 Tmpfs Mounts 23 v
📄 Page 8
Volume Drivers 24 Kubernetes Resources for Data Storage 26 Pods and Volumes 26 PersistentVolumes 33 PersistentVolumeClaims 37 StorageClasses 39 Kubernetes Storage Architecture 42 Flexvolume 42 Container Storage Interface 43 Container Attached Storage 45 Container Object Storage Interface 47 Summary 49 3. Databases on Kubernetes the Hard Way. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 The Hard Way 52 Prerequisites for Running Data Infrastructure on Kubernetes 53 Running MySQL on Kubernetes 53 ReplicaSets 54 Deployments 56 Services 60 Accessing MySQL 63 Running Apache Cassandra on Kubernetes 65 StatefulSets 67 Accessing Cassandra 78 Summary 80 4. Automating Database Deployment on Kubernetes with Helm. . . . . . . . . . . . . . . . . . . . . 81 Deploying Applications with Helm Charts 82 Using Helm to Deploy MySQL 83 How Helm Works 87 Labels 89 ServiceAccounts 90 Secrets 90 ConfigMaps 91 Updating Helm Charts 93 Uninstalling Helm Charts 94 Using Helm to Deploy Apache Cassandra 94 Affinity and Anti-Affinity 96 Helm, CI/CD, and Operations 99 Summary 102 vi | Table of Contents
📄 Page 9
5. Automating Database Management on Kubernetes with Operators. . . . . . . . . . . . . . . 103 Extending the Kubernetes Control Plane 104 Extending Kubernetes Clients 105 Extending Kubernetes Control Plane Components 105 Extending Kubernetes Worker Node Components 106 The Operator Pattern 107 Controllers 107 Custom Resources 110 Operators 112 Managing MySQL in Kubernetes Using the Vitess Operator 114 Vitess Overview 114 PlanetScale Vitess Operator 117 A Growing Ecosystem of Operators 127 Choosing Operators 127 Building Operators 130 Summary 133 6. Integrating Data Infrastructure in a Kubernetes Stack. . . . . . . . . . . . . . . . . . . . . . . . . . 135 K8ssandra: Production-Ready Cassandra on Kubernetes 135 K8ssandra Architecture 136 Installing the K8ssandra Operator 137 Creating a K8ssandraCluster 141 Managing Cassandra in Kubernetes with Cass Operator 143 Enabling Developer Productivity with Stargate APIs 147 Unified Monitoring Infrastructure with Prometheus and Grafana 150 Performing Repairs with Cassandra Reaper 154 Backing Up and Restoring Data with Cassandra Medusa 156 Creating a Backup 157 Restoring from Backup 158 Deploying Multicluster Applications in Kubernetes 159 Summary 165 7. The Kubernetes Native Database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Why a Kubernetes Native Approach Is Needed 167 Hybrid Data Access at Scale with TiDB 169 TiDB Architecture 170 Deploying TiDB in Kubernetes 173 Serverless Cassandra with DataStax Astra DB 182 What to Look for in a Kubernetes Native Database 189 Basic Requirements 189 The Future of Kubernetes Native 191 Summary 194 Table of Contents | vii
📄 Page 10
8. Streaming Data on Kubernetes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Introduction to Streaming 195 Types of Delivery 196 Delivery Guarantees 197 Feature Scope 198 The Role of Streaming in Kubernetes 199 Streaming on Kubernetes with Apache Pulsar 202 Preparing Your Environment 205 Securing Communications by Default with cert-manager 207 Using Helm to Deploy Apache Pulsar 211 Stream Analytics with Apache Flink 212 Deploying Apache Flink on Kubernetes 214 Summary 217 9. Data Analytics on Kubernetes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Introduction to Analytics 220 Deploying Analytic Workloads in Kubernetes 221 Introduction to Apache Spark 224 Deploying Apache Spark in Kubernetes 226 Build Your Custom Container 228 Submit and Run Your Application 228 Kubernetes Operator for Apache Spark 230 Alternative Schedulers for Kubernetes 233 Apache YuniKorn 235 Volcano 237 Analytic Engines for Kubernetes 240 Dask 242 Ray 244 Summary 246 10. Machine Learning and Other Emerging Use Cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 The Cloud Native AI/ML Stack 248 AI/ML Definitions 248 Defining an AI/ML Stack 250 Real-Time Model Serving with KServe 252 Full Lifecycle Feature Management with Feast 255 Vector Similarity Search with Milvus 258 Efficient Data Movement with Apache Arrow 261 Versioned Object Storage with lakeFS 264 Summary 268 viii | Table of Contents
📄 Page 11
11. Migrating Data Workloads to Kubernetes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 The Vision: Application-Aware Platforms 269 Charting Your Path to Success 271 People 272 Technology 276 Process 283 The Future of Cloud Native Data 288 Summary 292 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Table of Contents | ix
📄 Page 12
(This page has no text content)
📄 Page 13
Foreword You’re about to go on an amazing adventure into the heart of the biggest change in the technology industry. In this adventure, you’re part of a fellowship led by a brave duo who have dared the mountains, depths, and lakes of data for decades. You’ll journey with Patrick McFadin and Jeff Carpenter, along with a band of visionary practitioners, to attain the prize: the power to create the future of data. After reading this book, you’ll be able to create your own new adventures and bring others along with you to go beyond the old world of computation, ruled by infrastructure, into the new world of cognition, ruled by autonomous experiences. It’s going to be awesome. The book you hold is written in a time when we’ve already seen a significant change in how we imagine, understand, and operate large-scale systems. The act of writing a book about technology in the midst of all this change may itself seem quixotic, but it’s essential. It’s a moment to stop at the Last Homely House as we gather the cognitive tools, supplies, and artifacts that will help us in the journey ahead. We need all the help we can gather because change tends to accelerate. In a few short decades, we’ve gone from mainframes to networks to data centers to clouds. Each new era feels like a new world with new rules and new opportunities. We build ecosystems of tools to match the era; the tools enable faster progress, and we build even more of them; we grow unsatisfied despite the speed, and suddenly there is a new breakthrough that heralds another new era. Each era needs to deal with the same concerns: networking, computation, and data. At each leap forward, they all need to transform. Mainframes and terminals, networks and routers, data centers and virtualization, clouds and containers; by architecting for new levels of abundance, each sets a new bar for velocity, scale, and unit economics. We strive to go faster, bigger, and more efficient. As the tools change, so do the people; mindsets must be rebuilt for each new wave of abundance, from the mainframe high priests to overworked network admins to xi
📄 Page 14
dutiful datacenter operators to savvy cloud engineers. Infrastructure has always been considered expensive since it’s a cost of doing business rather than the business itself; each era’s technology teams have needed to focus on what the business values. Mainframes and transaction processing, networks and file sharing, data centers and ecommerce, clouds and apps—each era’s north star reflects the standard business focus of the time. As we look into the near future, we see new nouns: edges, models, predictions, and decisions, collectively powering autonomous businesses. What’s holding us back from the next era? What’s the big unlock we’re collectively struggling to achieve? It’s the one thing that we haven’t solved beautifully—yet. The first half of the cloud era was defined by Amazon’s AWS and copied by others: singular global-scale federated datacenters using virtual machines and infrastructure microservices, designed and evolved together as one unique whole. No two of these clouds are alike. They vary by aesthetic, by identity models, by billing systems, by APIs. Just like in the beginnings of the mainframe, network, and datacenter eras, each cloud stack was vertically integrated. This lock-in offered great utility at the price of never leaving. The second half of the cloud era is defined by Kubernetes and its vibrant ecosystem of tools, all built on the same premise: the unit of work is a container, not a virtual machine, physical server, or mainframe processor. Containers are the law of the land, representing the standard granularity of technology workloads until the next era comes along. It’s about transcending single clouds to gain a cloud native stance anywhere. The Kubernetes breakthrough is named cloud native to mark the mature state of the cloud era. What is the magic power we find in containers? It is simply this: we’ve learned that scaling out our ideas requires scaling down our units of work. Software is made of ideas; fluidity requires scaling down to fit these ideas into more efficient units, and leverage requires scaling out to take advantage of any available infrastructure. The cloud native manifesto that is so well-represented by Kubernetes and its ecosys‐ tem has taken us a long way toward the future, but we now find ourselves pinned in place, short of the summit we aspire to. For all the advances we’ve made, they are focused on stateless operations. We now face the final stage of the era: cloud native data. When we conquer this challenge collectively, we’ll have created a world where any app or model can run anywhere it’s needed, at the speed that users demand, because the data will flow with it. Whether it’s on a phone, a car, a metro edge, a cloud, or a satellite, the data will be self-describing, observable, fluid, and accessible. Infrastruc‐ ture can become invisible and deliver power however developers may dream. This book is key to unlocking that potential. xii | Foreword
📄 Page 15
Like any epic journey, cloud native data on Kubernetes is a progressive revelation. The ordinary world of storage and StatefulSets leads you to mastery of architect‐ ing data infrastructure for any given workload, from applications to analytics and machine learning. The door to the extraordinary world will then be open to you: a vision of the next generation of data management and the open source projects that are advancing the art of the possible. Open communities sharing ideas and code together are the only way we can realize this future. Looking ahead to the next decade, we don’t know exactly what the technologies we use will be named, but we do know that they will be built on the ideas we’re making real now. Welcome to the adventure of cloud native data, and take joy in the journey! — Sam Ramji Chief Strategy Officer at DataStax Strategic Advisor to the Linux Foundation Foreword | xiii
📄 Page 16
(This page has no text content)
📄 Page 17
Preface Is Kubernetes ready for stateful workloads? This might be the question that got you to open this book. Since cloud computing first emerged, data infrastructure (NoSQL/NewSQL, streaming, analytics) and appli‐ cation infrastructure (Docker, Kubernetes) have been maturing rapidly but on sepa‐ rate tracks. In our view, it’s time to formalize bringing these two areas together. This isn’t an aspiration for the future; it is already happening with collaboration across multiple communities. Organizations that are trying to manage two distinct stacks for applications and data will soon find themselves at a competitive disadvantage. For the first few years of Kubernetes’ existence after its public launch in 2014, the maxim that it was not ready for data and stateful workloads was rarely questioned. An example of the prevailing wisdom can be found in this Kelsey Hightower tweet from 2018: Kubernetes has made huge improvements in the ability to run stateful workloads including databases and message queues, but I still prefer not to run them on Kubernetes. Over the past few years, the tide has turned. Problem-solving engineers took this challenge from Kelsey and turned it into action. In some sense, the maturation of Kubernetes for stateful workloads was inevitable, as the demand was so great. Those of us who can remember arguments about why a database had to run on a bare-metal machine or why you should never deploy data infrastructure in containers can relate to this concern. We’ve also learned that there is a huge difference between “never” and “not yet.” Compute, storage, networking are now considered commodities; why not data man‐ agement? The value proposition of Kubernetes for reducing cost and simplifying application development means that the migration of data infrastructure onto Kuber‐ netes was inevitable. The changes are not just in Kubernetes. As you will see, projects in data infrastructure have been changing as well. xv
📄 Page 18
Why We Wrote This Book We were caught up in the trend of moving stateful workloads to Kubernetes when our “day job” responsibilities at DataStax challenged us to consider how to deploy and operate Apache Cassandra in Kubernetes effectively. In the spirit of open source development, we sought out other practitioners who were attempting similar feats (and succeeding) with databases and other stateful workloads. We found a group of like-minded individuals and helped launch the Data on Kubernetes Community (DoKC) in 2020. DoKC is now an independent organization and has hosted well over 100 meetups and several in-person events. The variety of topics and presenters in the DoKC meetup is evidence of a vibrant community, working collaboratively to establish standards and best practices. Most importantly, we are learning together, applying lessons from the past and supporting each other as we build something new. As we participated in these meetups, a set of common themes began to emerge. We heard, again and again, the virtues of the PersistentVolume subsystem, the pros and cons of StatefulSets, the promise of the operator pattern for making database operations more manageable, and the early hints of ideas for new types of data management. Over time, we developed a strong conviction that this fledgling com‐ munity of practitioners needed a place for all of the wisdom scattered across multiple presentations and blog posts to be gathered and distilled into a digestible form. This book is the result of that process. Much work remains to be done in the area of cloud native data, and many areas need further exploration, including operators, machine learning, data APIs, declarative management of data sets, and many more. Our hope is that this book opens the gates for a flood of additional books, blogs, presentations, and learning resources. Who Is This Book For? The primary audience for this book comprises the developers and architects who are designing, building, and running applications in the cloud. If that describes you and you’re picking up this book, chances are you’ve heard the thundering herd of organi‐ zations adopting Kubernetes and have joined that trend or are at least considering it. However, you may have also heard the reservations about stateful workloads on Kubernetes and are looking for help in how to proceed. You’ve come to the right place! By reading this book you will gain the following: • An understanding of basic Kubernetes resources and how they are used to compose data infrastructure • An appreciation for how tools like Helm and operators can automate the deploy‐ ment and operations of data infrastructure on Kubernetes xvi | Preface
📄 Page 19
• The ability to evaluate and select data infrastructure technologies for use in your applications • The knowledge of how to integrate these data infrastructure technologies into your overall stack • A view toward emerging technologies that will enhance your Kubernetes-based applications in the years to come A smaller but no less important audience includes core Kubernetes developers and data infrastructure developers, many of whom we’ve met through the DoKC. We hope to create a common set of principles and best practices that we can use as a framework to drive improvements into the Kubernetes core as well as the data infrastructure built to run in Kubernetes. Together we can push the practice of data on Kubernetes forward. For everyone, know that our objective in this book is to shoot straight. Where the technology is mature and solid, we’ll let you know, but there are also many areas where the technology is still emerging. We’ll make sure to highlight those areas where improvement is needed. How to Read This Book This book is designed to be read from start to finish, especially by readers who are less experienced with Kubernetes. The first few chapters introduce Kubernetes terminology and concepts that are referenced throughout the remainder of the book as we discuss more advanced topics. Here’s how this book is organized: Chapter 1, “Introduction to Cloud Native Data Infrastructure: Persistence, Streaming, and Batch Analytics” This chapter lays out the goal of modernizing your cloud native applications by putting not only stateless but also stateful workloads on Kubernetes. Of course we would say this, but you really should start here, as we define key goals and terms to give all readers a level playing field. Specifically, we propose a definition for the term cloud native data and define principles for cloud native data infrastructure that we’ll use to measure technologies throughout the rest of the book. Chapter 2, “Managing Data Storage on Kubernetes” In this chapter, we’ll look at one of the foundational areas for data infrastructure on Kubernetes: storage. We’ll begin with how storage works in containerized sys‐ tems starting with Docker, then moving to Kubernetes and its PersistentVolume subsystem. We’ll discuss the various types of storage available including file, block, and object storage, and the trade-offs of using local versus remote storage solutions. Preface | xvii
📄 Page 20
Chapter 3, “Databases on Kubernetes the Hard Way” This chapter introduces Kubernetes compute resources such as Pods, Deploy‐ ments, and StatefulSets and walks you through the step-by-step process of deploying databases like MySQL and Apache Cassandra using these resources. You’ll learn some of the strengths and weaknesses of StatefulSets for managing distributed databases. Chapter 4, “Automating Database Deployment on Kubernetes with Helm” Continuing the themes of the previous chapter, we revisit the deployment of MySQL and Cassandra on Kubernetes, this time in a more automated fashion using the Helm package manager. You’ll also learn about Kubernetes resources that help with configuration including ConfigMaps and Secrets. We discuss the role of Helm in your overall DevOps process and CI/CD toolset and some of its shortcomings with respect to managing database operations. Chapter 5, “Automating Database Management on Kubernetes with Operators” This chapter concludes our sequence on database deployment by introducing the operator pattern and demonstrating how operators can help manage “day two” database operations. We’ll examine how operators extend the Kubernetes control plane to manage databases, using Vitess (MySQL) and Cass Operator (Apache Cassandra) as examples. Along the way, you’ll learn how to assess operators’ maturity and even how to build your own operators by using frameworks such as the Operator SDK. Chapter 6, “Integrating Data Infrastructure in a Kubernetes Stack” In this chapter, we begin to expand the focus beyond just deploying and operat‐ ing databases to consider how databases and other data infrastructure can be incorporated in your overall application stack. We’ll look at a project called K8ssandra that integrates Apache Cassandra along with tools for managing mon‐ itoring, security, and database backups, and an API layer for easier data access. Chapter 7, “The Kubernetes Native Database” At this point, we take a step back and summarize what you’ve learned about cloud native data management in the book’s first half and use that knowledge to consider the question, “What is a Kubernetes native database?” More than just a debate about industry buzzwords, this discussion is an important one for you who are involved in selecting data infrastructure and those developing that infrastructure. Chapter 8, “Streaming Data on Kubernetes” Moving beyond persistence, we’ll start working through the rest of the data infra‐ structure, starting with streaming technologies. Moving and processing data in cloud native applications is just as prevalent as database persistence, but requires different strategies in deployment: connecting endpoints securely and building in xviii | Preface
The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00
Total Amount (¥)
0
Donation Count

Login to support the author

Login Now
Back to List