(This page has no text content)
Implementing Data Mesh Design, Build, and Implement Data Contracts, Data Products, and Data Mesh Jean-Georges Perrin and Eric Broda Foreword by Scott Hirleman
Implementing Data Mesh by Jean-Georges Perrin and Eric Broda Copyright © 2024 Oplo LLC and Broda Group Software Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Aaron Black Development Editor: Shira Evans Production Editor: Beth Kelly Copyeditor: Shannon Turlington Proofreader: Krsta Technology Solutions Indexer: nSight, Inc.
Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea September 2024: First Edition Revision History for the First Edition 2024-09-04: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098156220 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Implementing Data Mesh, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-098-15622-0 [LSI]
Dedication Liz, Thank you for supporting my crazy ideas, like this book (and the next ones we have not yet talked about). Je t’aime. —Jean-Georges Susan, Davis, and Graeham, Thanks for your support, ideas, and help! You guys are the best! —Eric
Foreword When I first encountered the Data Mesh concept in late 2020, I was much more steeped in the best practices of the operational world, especially site reliability engineering (SRE) and distributed systems. Zhamak Dehghani, the creator of the Data Mesh concept, proposed a number of ways of doing analytics, machine learning, and data work that felt very familiar and a bit obvious because of how long they had been adopted in the software world, like shifting ownership left, product thinking, continuous integration and continuous delivery (CI/CD), etc. However, after I spent a few weeks digging deeper, including into what people were actually saying about Data Mesh in posts on LinkedIn and Twitter—so not just the articles, presentations, and podcasts—there was a resounding question underlying everything: OK, but how? The data world in general hadn’t done work in any way close to what Data Mesh calls for, and trying to simply jump into entirely new ways is 1) disruptive in general and 2) abhorrent to most data people.... Because of all the questions around Data Mesh, I created a community that is now over 10,000 people strong (Data Mesh Learning) and a podcast as well to explore this with more than
300 episodes (Data Mesh Radio). Because that question is still the most pertinent and pernicious one about Data Mesh today: OK, but how? From the very early days of the community, both Jean-Georges and Eric have been helping people to explore this question. Not just what is Data Mesh—I do not miss the days when there were 15 new articles with that title each week—but how do we go about actually implementing Data Mesh as a concept in our organizations? There aren’t exact answers because it’s like asking how someone can live a good life—everyone’s sense of what matters and what is of value is different, as is everyone’s starting point. If you think reasonably, there can’t be an exact answer to how we do Data Mesh because every organization is different: the aims of the organization, maturity levels, what data can do to help the organization compete, the org structure, and so on. If you are already very distributed, but your data work is very siloed and sharing between lines of business (aka domains) is not the status quo, the steps you have to take to move forward are vastly different from a highly centralized, command-and- control-type organization’s steps.
But there is hope. There have been a number of organizations driving significant value through a Data Mesh approach. Jean- Georges and Eric have been exploring since the beginning of the Data Mesh community what patterns actually matter to getting from where you are today toward better. And of course, how to keep getting better and better. Because it’s always about not getting things perfect but getting to good enough for now. There are potentially thousands of decisions someone leading a Data Mesh journey might have to make, but the question becomes which decisions matter when and why. I will emphasize this point because it is actually probably the second most important one to make: in a Data Mesh implementation, there are thousands of decisions to make. Most don’t matter too much in the greater scheme. Still go for good answers, but this book will help you focus on what decisions provide the best path to success as well as the ones that provide the most leverage to success. Much like choosing what you wear to do yard work probably isn’t going to impact your life much (do wear safety gear, though!) but it might impact the outcome of a date or an interview, there will be many questions in a Data Mesh journey where you aren’t sure of the right answer. But this book will help you better assess which ones really matter and how to
measure the success of your approach so that you can improve it as you continue to drive to better. There are multiple paths to success with Data Mesh—we’ve seen that across the hundreds of companies that have talked publicly about their journeys. So we need to decontextualize what is working and why from those organizations and learn from them. And that’s what Jean-Georges and Eric are doing in this book for you. It is a very difficult task, so they’ve made your life far easier with this book! I have probably spoken with close to 500 people across a 1,000- plus conversations about Data Mesh, and there are few I would put at the same level as Eric and Jean-Georges on their ability to figure out what matters when and why, and then especially be able to succinctly communicate that. Because that communication part is crucial, and they do it extremely well. It’s also crucial to pay attention to one word used over and over in Data Mesh literature: journey. You do not have to get this perfect up-front. You will learn, and the most important thing is to realize that you can try, test, and iterate. It’s about getting to better, not getting to perfect. And the journey of a thousand miles starts with but one first step. We’re all still learning how to do Data Mesh well, and this book crystallizes much of what
the broader community and industry have learned to date. It can help you immensely with doing Data Mesh better and lowering your stress by focusing on what matters instead of getting bogged down in those thousands of less impactful decisions. I also want to emphasize again that there is no exact playbook. Take what you read here, and understand the reasoning behind the decisions and adapt them to your organization. If you want an easy approach, just throw all your data into a data swamp and be done with it. This is going to take some work on your end, but I promise you, Data Mesh can bring incredible value to the organization and incredible personal value and fulfillment to those leading the implementation. In wrapping up so that you can get to the good stuff, I will give you, dear reader, my more succinct warning and encouragement I give to everyone exploring Data Mesh in an attempt to try to make their organization better in oh so many ways: you can’t copy someone else’s blueprint, but you absolutely should dig into what decisions were the key leverage points. And don’t get so bogged down in the exact technical details—Data Mesh is a crucial driver in making your organization better able to leverage data, but it’s about so much more than the platform. Have patience and give yourself the
grace of making not-so-great decisions with an eye on constantly making better ones. And good luck, it’s gonna be a heck of a ride. :) Scott Hirleman Founder, Data Mesh Learning Host, Data Mesh Radio
Preface In 2019, Zhamak Dehghani came up with the concept of Data Mesh. It took her 18 months to refine her ideas into the four principles of Data Mesh: domain ownership, data as a product, self-serve data platform, and (my favorite) federated computational governance. The movement was launched. It has the strength of the tide—nothing is stopping it. Her foundational book, Data Mesh (O’Reilly), was published in 2022 and confirmed the strength and desire of an entire industry to change to a better model. But… To implement Dehghani’s vision, practitioners needed a guide— a hands-on, practical way to build Data Mesh. We took on this task in early 2023 and are delivering it to you now. We hope you will enjoy it. If this book is not enough for you, remember that both of us are available to help you make your Data Mesh a reality!
Who Is This Book For? This book is for all those who are interested in Data Mesh concepts and implementation, regardless of their level of expertise. We value your interest and are excited to share our insights with you. If you are a data engineer, you will learn how your job and tasks will evolve with Data Mesh. Don’t worry—it will make your day-to-day a lot more interesting. If you are an architect, you will discover how this software architecture will benefit data architecture and will allow you to build better data platforms. If you are a technology leader, this book will give all the help you need to build and implement Data Mesh. Most importantly, Part III imparts a lot of knowledge about change management and the social aspects of Data Mesh. If you are a nontechnical C-level, this book may not be the best investment for you, but it is for all the technology staff in your organization. Think about end-of-year gifts, birthdays, anniversaries… You will make people happy.
Overview of the Parts and Chapters This book is divided into 3 parts and 16 fast-paced chapters. Part I, “The Basics”, sets up the basics and gives a quick reminder of Dehghani’s work: Chapter 1, “Understanding Data Mesh: The Essentials”, outlines the fundamental principles of Data Mesh, a modern data architecture paradigm that promotes decentralized data ownership, treats data as a product, and implements self- serve infrastructure for domain teams. Building on Dehghani’s foundational work, this chapter emphasizes how Data Mesh introduces agility into data management by enabling local autonomy and faster response times as well as by fostering a culture of innovation and collaboration. Chapter 2, “Applying Data Mesh Principles”, summarizes the key principles of Data Mesh and focuses on how they apply to data products, including the FAIR (findable, accessible, interoperable, and reusable) product, as well as what constitutes a good data product and the lifecycles of data products. The goals of this chapter are to create a practical Data Mesh roadmap, translate your strategy and vision into an achievable plan, secure executive sponsorship and
funding, empower a skilled data product owner with decision-making authority, and engage customers while maintaining flexibility and alignment with business objectives. Chapter 3, “Our Case Study: Climate Quantum Inc.”, introduces Climate Quantum Inc., a fictional company leveraging Data Mesh capabilities to address the complexities of managing climate data, making it more accessible, usable, and trustworthy. By decentralizing data ownership and using a domain-oriented architecture, Climate Quantum Inc. aims to streamline the discovery, consumption, sharing, and verification of vast and varied climate data, thus providing a scalable solution to the multifaceted challenges posed by climate change. Part II, “Designing, Building, and Deploying Data Mesh”, focuses on the technology aspect of Data Mesh: Chapter 4, “Defining the Data Mesh Architecture”, explores the core components of Data Mesh, focusing on the architecture of data products as well as the broader Data Mesh architecture and highlighting how various artifacts and development, runtime, and operational capabilities come together to create discoverable, observable, and operable data products. This chapter also delves into how these
components are integrated through Data Mesh backbone services, marketplaces, and registries, using Climate Quantum Inc. as a case study to illustrate the practical application of these principles for managing complex climate data. Chapter 5, “Driving Data Products with Data Contracts”, discusses the implementation of data products, emphasizing the role of data contracts in establishing trust by ensuring data quality and service levels and using examples from Climate Quantum Inc. to illustrate these concepts. The chapter explores the principles of product thinking, details the elements of data contracts, and introduces the data quality of service (data QoS) framework for combining dimensions of data quality with service-level agreements, which promotes a standardized, reliable approach to data management. Chapter 6, “Building Your First Data Product”, guides you through the steps of creating your initial data product by understanding its components, leveraging data contracts, connecting data sources, and building endpoints while ensuring that observability, discovery, and control services are integrated. The chapter emphasizes the standardization and modularity of data products, facilitated by using sidecars
and open standards like the ones promoted by the Bitol project to streamline development and operations. Chapter 7, “Aligning with the Experience Planes”, explains how to separate responsibilities across three functional areas in a Data Mesh: the infrastructure experience plane for data infrastructure, the data product experience plane for independent data products, and the mesh experience plane for interconnecting data products and managing enterprise- level tools. Each of these areas has specific capabilities to streamline organization and reduce cognitive load. The chapter also delves into how these planes communicate, particularly focusing on feedback loops, both user and system, which travel across planes to enhance data reliability and inform continuous improvement. Chapter 8, “Meshing Your Data Products”, explains how to register, assemble, and utilize multiple data products within a Data Mesh to enhance their value and ensure data quality and governance. This chapter also focuses on the key concepts of producer-aligned and consumer-aligned data products. Finally, you will learn how Data Mesh can simplify data lineage. Part III, “GenAI, Teams, Operating Model, and Roadmap for Data Mesh”, focuses mainly on the operating and social aspects of Data Mesh:
Chapter 9, “Running and Operating Your Data Mesh”, explores how to make data products discoverable, observable, and secure, highlighting the dynamic nature of data within Data Mesh, the crucial interfaces and processes involved in ensuring seamless operation, and the opportunities for enhanced data management through standardization and self-serve capabilities, all of which ultimately foster a more agile and efficient data ecosystem. Chapter 10, “Creating a Data Mesh Marketplace”, addresses the challenge of finding data products in a growing Data Mesh ecosystem by proposing a Data Mesh Marketplace, which, unlike traditional data catalogs, provides a dynamic, user-friendly platform for data discovery, consumption, and sharing that leverages self-serve capabilities and minimizes metadata duplication. Chapter 11, “Establishing Data Mesh Governance”, explains how the self-serve capability and embedded agents within dynamic data products facilitate a more agile, federated approach to data governance, emphasizing certification for compliance, which decentralizes policy enforcement to data product owners while maintaining centralized policy definition. Chapter 12, “Understanding Data Product Supply Chains”, explains how the embedded services and self-serve
capabilities of data products enable the creation of consistent, efficient, and repeatable “data product factories” and establish a dynamic data supply chain ecosystem analogous to modern manufacturing supply chains. Chapter 13, “Integrating Data Mesh and Generative AI”, reveals that by combining the decentralized wonders of Data Mesh with the mind-blowing capabilities of generative AI, organizations can turbocharge their data-driven decision- making processes, creating a future where even your data products have the brains to make your business smarter! Chapter 14, “Establishing Data Mesh Teams”, emphasizes that successful Data Mesh implementation relies 20% on technology and 80% on winning over people, with data product teams acting like autonomous “data product factories” within a sociotechnical ecosystem while interacting with platform and enabling teams to create a flourishing data-driven environment. Chapter 15, “Defining a Data Mesh Operating Model”, explains how Data Mesh requires a shift from traditional centralized data management to a decentralized, domain- centric approach, involving the creation of an operating model that aligns people, processes, and technology to manage, share, and utilize data products efficiently across an organization.
Comments 0
Loading comments...
Reply to Comment
Edit Comment