Name: Data Management at Scale Modern Data Architecture with Data Mesh and Data Fabric - 2nd Edition (Piethein Strengholt)（Z-Library）
Availability: InStock
Rating: 5 (103 reviews)
Author: Piethein Strengholt

Page 1

Second Edition Data Management at Scale Modern Data Architecture with Data Mesh and Data Fabric Piethein Strengholt SECOND EDITION 2nd Edition

Page 2

DATA “For startups, larger companies, and old industrial giants alike, this pragmatic yet visionary book is the essential guide to modern data management. It’s about scaling and staying competitive. There is no other book like this one out there. It’s a must-read.” —Ole Olesen-Bagneux author of The Enterprise Data Catalog “Data management is challenging enough at a small scale. This book provides a very detailed and solid foundation for data management today and in the future.” —Joe Reis coauthor of Fundamentals of Data Engineering and “recovering data scientist” Data Management at Scale Twitter: @oreillymedia linkedin.com/company/oreilly-media youtube.com/oreillymedia Data management is subject to disruption. Trends like artificial intelligence, cloudification, ecosystem connectivity, microservices, open data, software as a service, and new software delivery models are causing a paradigm shift in the way data management is practiced. Organizations need to face the fact that decentralization is inevitable. In this practical book, author Piethein Strengholt explains how to establish a future-proof and scalable data management practice. He’ll cut through new concepts like data mesh and data fabric and demonstrate what a next-gen data architecture will look like. Executives, architects and engineers, analytics teams, and compliance and governance staff will learn how to shape data management according to their needs. Strengholt provides blueprints, principles, observations, best practices, and patterns to get you up to speed. This guide shows you how to: • Examine data management trends, including regulatory requirements, privacy concerns, and new developments such as data mesh and data fabric • Go deep into building a modern data architecture, including cloud data landing zones, domain-driven design, data product design, and more • Explore data governance and data security, master data management, self-service data marketplaces, and the importance of metadata Piethein Strengholt is the chief data officer for Microsoft Netherlands. He acts as a counterpart to CDOs and is a driving force in the community and the product group. Piethein is also a prolific blogger and regularly speaks about the latest trends in data management, including the data mesh concept, data governance, and strategy at scale. US $69.99 CAN $87.99 ISBN: 978-1-098-13886-8 SECOND EDITION

Page 3

Piethein Strengholt Data Management at Scale Modern Data Architecture with Data Mesh and Data Fabric SECOND EDITION Boston Farnham Sebastopol TokyoBeijing

Page 4

978-1-098-13886-8 [LSI] Data Management at Scale by Piethein Strengholt Copyright © 2023 Piethein Strengholt. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Michelle Smith Development Editor: Shira Evans Production Editor: Katherine Tozer Copyeditor: Rachel Head Proofreader: Piper Editorial Consulting, LLC Indexer: nSight, Inc. Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea April 2023: Second Edition Revision History for the Second Edition 2023-04-10: First Release See https://oreilly.com/catalog/errata.csp?isbn=9781098138868 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Data Management at Scale, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. This work is part of a collaboration between O’Reilly and Microsoft. See our statement of editorial independence.

Page 5

Table of Contents Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1. The Journey to Becoming Data-Driven. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Recent Technology Developments and Industry Trends 2 Data Management 4 Analytics Is Fragmenting the Data Landscape 8 The Speed of Software Delivery Is Changing 9 The Cloud’s Impact on Data Management Is Immeasurable 10 Privacy and Security Concerns Are a Top Priority 11 Operational and Analytical Systems Need to Be Integrated 12 Organizations Operate in Collaborative Ecosystems 13 Enterprises Are Saddled with Outdated Data Architectures 14 The Enterprise Data Warehouse: A Single Source of Truth 14 The Data Lake: A Centralized Repository for Structured and Unstructured Data 17 The Pain of Centralization 18 Defining a Data Strategy 19 Wrapping Up 22 2. Organizing Data Using Data Domains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Application Design Starting Points 26 Each Application Has a Data Store 26 Applications Are Always Unique 26 Golden Sources 26 The Data Integration Dilemma 27 Application Roles 27 iii

Page 6

Inspirations from Software Architecture 29 Data Domains 32 Domain-Driven Design 32 Business Architecture 35 Domain Characteristics 45 Principles for Distributed and Domain-Oriented Data Management 50 Design Principles for Data Domains 51 Best Practices for Data Providers 53 Domain Ownership Responsibilities 55 Transitioning Toward Distributed and Domain-Oriented Data Management 56 Wrapping Up 57 3. Mapping Domains to a Technology Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Domain Topologies: Managing Problem Spaces 62 Fully Federated Domain Topology 62 Governed Domain Topology 66 Partially Federated Domain Topology 69 Value Chain–Aligned Domain Topology 70 Coarse-Grained Domain Topology 71 Coarse-Grained and Partially Governed Domain Topology 73 Centralized Domain Topology 74 Picking the Right Topology 77 Landing Zone Topologies: Managing Solution Spaces 78 Single Data Landing Zone 80 Source- and Consumer-Aligned Landing Zones 87 Hub Data Landing Zone 88 Multiple Data Landing Zones 89 Multiple Data Management Landing Zones 92 Practical Landing Zones Example 93 Wrapping Up 95 4. Data Product Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 What Are Data Products? 99 Problems with Combining Code, Data, Metadata, and Infrastructure 100 Data Products as Logical Entities 101 Data Product Design Patterns 103 What Is CQRS? 104 Read Replicas as Data Products 106 Design Principles for Data Products 107 Resource-Oriented Read-Optimized Design 108 Data Product Data Is Immutable 109 Using the Ubiquitous Language 109 iv | Table of Contents

Page 7

Capture Directly from the Source 110 Clear Interoperability Standards 110 No Raw Data 110 Don’t Conform to Consumers 111 Missing Values, Defaults, and Data Types 112 Semantic Consistency 112 Atomicity 112 Compatibility 113 Abstract Volatile Reference Data 113 New Data Means New Ownership 113 Data Security Patterns 114 Establish a Metamodel 114 Allow Self-Service 115 Cross-Domain Relationships 115 Enterprise Consistency 115 Historization, Redeliveries, and Overwrites 116 Business Capabilities with Multiple Owners 116 Operating Model 116 Data Product Architecture 117 High-Level Platform Design 117 Capabilities for Capturing and Onboarding Data 119 Data Quality 121 Data Historization 122 Solution Design 127 Real-World Example 129 Alignment with Storage Accounts 133 Alignment with Data Pipelines 134 Capabilities for Serving Data 135 Data Serving Services 136 File Manipulation Service 137 De-Identification Service 137 Distributed Orchestration 138 Intelligent Consumption Services 138 Direct Usage Considerations 139 Getting Started 139 Wrapping Up 140 5. Services and API Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Introducing API Management 144 What Is Service-Oriented Architecture? 145 Enterprise Application Integration 148 Service Orchestration 150 Table of Contents | v

Page 8

Service Choreography 153 Public Services and Private Services 154 Service Models and Canonical Data Models 154 Parallels with Enterprise Data Warehousing Architecture 155 A Modern View of API Management 157 Federated Responsibility Model 157 API Gateway 158 API as a Product 160 Composite Services 160 API Contracts 161 API Discoverability 161 Microservices 161 Functions 162 Service Mesh 162 Microservice Domain Boundaries 164 Ecosystem Communication 165 Experience APIs 166 GraphQL 166 Backend for Frontend 167 Practical Example 167 Metadata Management 169 Read-Oriented APIs Serving Data Products 170 Wrapping Up 170 6. Event and Notification Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Introduction to Events 174 Notifications Versus Carried State 175 The Asynchronous Communication Model 176 What Do Modern Event-Driven Architectures Look Like? 177 Message Queues 177 Event Brokers 177 Event Processing Styles 179 Event Producers 180 Event Consumers 182 Event Streaming Platforms 184 Governance Model 191 Event Stores as Data Product Stores 192 Event Stores as Application Backends 193 Streaming as the Operational Backbone 193 Guarantees and Consistency 194 Consistency Level 194 Processing Methods 195 vi | Table of Contents

Page 9

Message Order 196 Dead Letter Queue 196 Streaming Interoperability 196 Governance and Self-Service 197 Wrapping Up 198 7. Connecting the Dots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Cross-Domain Interoperability 202 Quick Recap 203 Data Distribution Versus Application Integration 204 Data Distribution Patterns 205 Application Integration Patterns 206 Consistency and Discoverability 208 Inspiring, Motivating, and Guiding for Change 212 Setting Domain Boundaries 213 Exception Handling 215 Organizational Transformation 216 Team Topologies 218 Organizational Planning 221 Wrapping Up 222 8. Data Governance and Data Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Data Governance 223 The Governance Framework 224 Processes: Data Governance Activities 230 Making Governance Effective and Pragmatic 231 Supporting Services for Data Governance 234 Data Contracts 236 Data Security 241 Current Siloed Approach 241 Trust Boundaries 242 Data Classifications and Labels 243 Data Usage Classifications 244 Unified Data Security 245 Identity Providers 248 Real-World Example 248 Typical Security Process Flow 251 Securing API-Based Architectures 256 Securing Event-Driven Architectures 259 Wrapping Up 260 Table of Contents | vii

Page 10

9. Democratizing Data with Metadata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Metadata Management 265 The Enterprise Metadata Model 266 Practical Example of a Metamodel 267 Data Domains and Data Products 269 Data Models 270 Data Lineage 275 Other Metadata Areas 275 The Metalake Architecture 277 Role of the Catalog 277 Role of the Knowledge Graph 279 Wrapping Up 288 10. Modern Master Data Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Master Data Management Styles 293 Data Integration 295 Designing a Master Data Management Solution 296 Domain-Oriented Master Data Management 297 Reference Data 297 Master Data 299 MDM and Data Quality as a Service 302 MDM and Data Curation 303 Knowledge Exchange 304 Integrated Views 305 Reusable Components and Integration Logic 305 Republishing Data Through Integration Hubs 305 Republishing Data Through Aggregates 306 Data Governance Recommendations 308 Wrapping Up 309 11. Turning Data into Value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 The Challenges of Turning Data into Value 312 Domain Data Stores 314 Granularity of Consumer-Aligned Use Cases 318 DDSs Versus Data Products 320 Best Practices 322 Business Requirements 322 Target Audience and Operating Model 323 Nonfunctional Requirements 324 Data Pipelines and Data Models 326 Scoping the Role Your DDSs Play 329 Business Intelligence 331 viii | Table of Contents

Page 11

Semantic Layers 331 Self-Service Tools and Data 333 Best Practices 335 Advanced Analytics (MLOps) 336 Initiating a Project 339 Experimentation and Tracking 340 Data Engineering 342 Model Operationalization 343 Exceptions 344 Wrapping Up 345 12. Putting Theory into Practice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 A Brief Reflection on Your Data Journey 349 Centralized or Decentralized? 350 Making It Real 351 Opportunistic Phase: Set Strategic Direction 351 Transformation Phase: Lay Out the Foundation 356 Optimization Phase: Professionalize Your Capabilities 361 Data-Driven Culture 365 DataOps 365 Governance and Literacy 369 The Role of Enterprise Architects 369 Blueprints and Diagrams 370 Modern Skills 370 Control and Governance 370 Last Words 371 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Table of Contents | ix

Page 12

(This page has no text content)

Page 13

Foreword Whenever we talk about software, we inevitably end up talking about data—how much there is, where it all lives, what it means, where it came from or needs to go, and what happens when it changes. These questions have stuck with us over the years, while the technology we use to manage our data has changed rapidly. Today’s databases provide instantaneous access to vast online datasets; analytics systems answer complex, probing questions; event-streaming platforms not only connect different applications but also provide storage, query processing, and built-in data management tools. As these technologies have evolved, so have the expectations of our users. A user is often connected to many different backend systems, located in different parts of a company, as they switch from mobile to desktop to call center, change location, or move from one application to another. All the while, they expect a seamless and real-time experience. I think the implications of this are far greater than many may realize. The challenge involves a large estate of software, data, and people that must appear—at least to our users—to be a single joined-up unit. Managing company-wide systems like this has always been a dark art, something I got a feeling for when I helped build the infrastructure that backs LinkedIn. All of LinkedIn’s data is generated continuously, 24 hours a day, by processes that never stop. But when I first arrived at the company, the infrastructure for harnessing that data was often limited to big, slow, batch data dumps at the end of the day and simplistic lookups, jerry-rigged together with homegrown data feeds. The concept of “end-of-the-day batch processing” seemed to me to be some legacy of a bygone era of punch cards and mainframes. Indeed, for a global business, the day doesn’t end. As LinkedIn grew, it too became a sprawling software estate, and it was clear to me that there was no off-the-shelf solution for this kind of problem. Furthermore, having built the NoSQL databases that powered LinkedIn’s website, I knew that there was an emerging renaissance of distributed systems techniques, which meant solutions could be built that weren’t possible before. This led to Apache Kafka, which combined xi

Page 14

scalable messaging, storage, and processing over the profile updates, page visits, payments, and other event streams that sat at the core of LinkedIn. While Kafka streamlined LinkedIn’s dataflows, it also affected the way applications were built. Like many Silicon Valley firms at the turn of the last decade, we had been experimenting with microservices, and it took several iterations to come up with something that was both functional and stable. This problem was as much about data and people as it was about software: a complex, interconnected system that had to evolve as the company grew. Handling a problem this big required a new kind of technology, but it also needed a new skill set to go with it. Of course, there was no manual for navigating this problem back then. We worked it out as we went along, but this book may well have been the missing manual we needed. In it, Piethein provides a comprehensive strategy for managing data not sim‐ ply in a solitary database or application but across the many databases, applications, microservices, storage layers, and all other types of software that make up today’s technology landscapes. He also takes an opinionated view, with an architecture to match, grounded in a well-thought-out set of principles. These help to bound the decision space with logical guardrails, inside of which a host of practical solutions should fit. I think this approach will be very valuable to architects and engineers as they map their own problem domain to the trade-offs described in this book. Indeed, Piethein takes you on a journey that goes beyond data and applications into the rich fabric of interactions that bind entire companies together. — Jay Kreps Cofounder and CEO at Confluent xii | Foreword

Page 15

Preface Data management is an emerging and disruptive subject. Datafication is everywhere. This transformation is happening all around us: in smartphones, TV devices, eread‐ ers, industrial machines, self-driving cars, robots, and so on. It’s changing our lives at an accelerating speed. As the amount of data generated skyrockets, so does its complexity. Disruptive trends like cloudification, API and ecosystem connectivity, microservices, open data, soft‐ ware as a service (SaaS), and new software delivery models have a tremendous effect on data management. In parallel, we see an enormous number of new applications transforming our businesses. All these trends are fragmenting the data landscape. As a result, we are seeing more point-to-point interfaces, endless discussions about data quality and ownership, and plenty of ethical and legal dilemmas regarding privacy, safety, and security. Agility, long-term stability, and clear data governance compete with the need to develop new business cases swiftly. We sorely need a clear vision for the future of data management. This book’s perspective on data management is informed by my personal experience driving the data architecture agenda for a large enterprise as chief data architect. Executing that role showed me clearly the impact a good data strategy can have on a large organization. After leaving that company, I started working as the chief data officer for Microsoft Netherlands. In this exciting new position, I’ve worked with over 50 large customers discussing and attempting to come up with a perfect data solution. Here are some of the common threads I’ve identified across all enterprises: • An overarching data strategy is often missing or not connected to the business objectives. Discussions about data management mostly pivot to technology trends and engineering discussions. What is needed is business engagement: a good strategy and well-thought-out data management and analysis plan that includes tangible value in the form of business use cases. To make my point: the focus must be put on usage and turning data into business value. xiii

Page 16

1 The statements and opinions expressed in this book don’t necessarily reflect the positions of ABN AMRO or Microsoft. • Enterprises have difficulties in interpreting new concepts like the data mesh and data fabric, because pragmatic guidance and experiences from the field are missing. In addition to that, the data mesh fully embraces a decentralized approach, which is a transformational change not only for the data architecture and technology, but even more so for organization and processes. This means the transformation cannot only be led by IT; it’s a business transformation as well. • Enterprises find it difficult to comprehend the latest technology trends. They’re unable to interpret nuances or make pragmatic choices. • Enterprises struggle to get started: large ambitions often end with limited action; the execution plan and architecture remain too high-level, too conceptual; top- down commitment from leadership is missing. These experiences and my observations across a range of enterprises inspired me to write this second edition of Data Management at Scale. You may wonder why this book is worth reading, over the first edition—let’s take a closer look. Why I Wrote This Book and Why Now The first edition was founded on the experience I gained while working at ABN AMRO as chief data architect.1 In that role, my team and I practiced the approach of federation: shifting activities and responsibilities in response to the need for a faster pace of change. We used governance for balancing the imperatives of centralization and decentralization. This shift was supported by a central data team that started to develop platforms for empowering business units to meet their goals. With plat‐ forms, we introduced self-service and aligned analysts to domains, supporting them in implementing their use cases. We experimented with domain-driven design and eventually switched to business architecture for managing the architectural landscape as a whole. I used all these experiences as input for writing the first edition. The term data mesh as a description of a sociotechnical approach to using data at large was coined at around the time the manuscript for the first edition was being finalized. When Zhamak Dehghani’s article describing the concept appeared on Martin Fowler’s website, it revealed concrete names for concepts we’d already been using at ABN AMRO for many years. These names became industry terms, and the concept quickly began to resonate with large organizations as a solution to the friction enterprises encounter when scaling up. So, why write a second edition? To start with, it was the data mesh concept. I love the ideas of bringing data management and software architecture closer together and xiv | Preface

Page 17

2 The terminology “master/slave” is clearly offensive, and many organizations have switched to alternatives like “source/replica” or “primary/subordinate.” We strive to be as inclusive as possible, but will use “master data management” in this book because the industry hasn’t yet adopted an alternative. businesses taking ownership of their data, but I firmly believe that, with all the fuss, a more nuanced view is needed. In my previous role as an enterprise architect, we had hundreds of application teams, thousands of services, and many large legacy applications to manage. In such situations, you approach complexity differently. With the data mesh architecture, artist, song, and playlist are often used as data domain examples. This approach of decomposing data into fine-grained domains might work well when designing microservices, but it isn’t well suited to (re)structuring large data landscapes. A different viewpoint is needed for scale. Next, a more nuanced and pragmatic view of data products is needed. There are good reasons why data must be managed holistically and end-to-end. Enterprises have reusability and consistency concerns. They’re forced by regulation to conform to the same dimensions for group reporting, accounting, financial reporting, and auditing and risk management. I know this might sound controversial, but a data product cannot be advocated to be managed as a container: something that packages data, metadata, code, and infrastructure all together in an architecture as tiny as a microservice. This doesn’t reflect how today’s big data platforms work. Finally, the data mesh story isn’t complete: it focuses only on data that is used for analytical purposes, not operational purposes; it omits master data management;2 the consumer side must be complemented with an intelligent data fabric; and it doesn’t provide much data modeling guidance for building data products. Another incentive for publishing a second edition was concerns about the book’s practicality. The first version was perceived by various readers as too abstract. Some critical reviewers even left comments questioning my hands-on experience. In this second edition I’ve worked hard to address these concerns, providing many real-world examples and concrete solution diagrams. From time to time, I also refer to blog posts that I’ve written about how to implement designs. One final note on this: there are a large number of very complex topics to cover, which are also highly context-sensitive. It would be impossible to provide examples of everything in a single volume, so I’ve had to use some discretion. I’m excited to share my thoughts on best practices and observations from the field, and I hope this book inspires you. Reflecting on my time working at ABN AMRO, there are lots of good lessons to be taken from other enterprises. I’ve seen a lot of good approaches. There’s no right or wrong when building good data architecture; it’s all about making the right trade-offs and discovering what works best for your situation. Preface | xv

Page 18

If you’ve already read the first edition, you should find this one significantly different and much improved. Structurally it’s more or less the same, but every chapter has been revised and enhanced. All the diagrams have also been revised, new content has been added, and it’s much more practical. Within each chapter you’ll find many tips, starting points, and references to helpful articles. Who Is This Book For? This book is intended for large enterprises, though smaller organizations may find much of value in it. It’s geared toward: Executives and architects Chief data officers, chief technology officers, chief architects, enterprise archi‐ tects, and lead data architects Analytics teams Data scientists, data engineers, data analysts, and heads of analytics Development teams Data engineers, data scientists, business intelligence engineers, data modelers and designers, and other data professionals Compliance and governance teams Chief information security officers, data protection officers, information security analysts, regulatory compliance heads, data stewards, and business analysts How to Read or Use This Book It’s important to say up front that this book touches upon a lot of complex topics that are often interrelated or intertwined with other subjects. So we’ll be hopping between different technologies, business methods, frameworks, and architecture pat‐ terns. From time to time I bring in my own operational experience when implement‐ ing different architectures, so we’ll be working at different levels of abstraction. To describe the journey through the book, I’ll use the analogy of a helicopter ride. We’ll start with a zoomed-out view, looking at data management, data strategy, and data architecture at an abstract and higher level. From this helicopter view, we’ll start to zoom in and first explore what data domains and landing zones are. We’ll then fly to the source system side of our landscape, in which applications are managed and data is created, and circle until we have covered most of the areas of data management. Then we’ll fly over to the consumer side of the landscape and start learning about the dynamics there. After that, we’ll bring everything we’ve covered together by putting things into practice. To help you navigate through the book, the following table gives a high-level over‐ view of which subjects will be intensively discussed in each chapter. xvi | Preface

Page 19

Table P-1. Key topics in each chapter Ch. 1 Ch. 2 Ch. 3 Ch. 4 Ch. 5 Ch. 6 Ch. 7 Ch. 8 Ch. 9 Ch. 10 Ch. 11 Ch. 12 Data management x Data strategy x x x x Data architecture x x x x Data integration x x x x Data modeling x x x Data governance x Data security x Data quality x Metadata management x MDM x Business intelligence x Advanced analytics x Enterprise architecture x x Chapter 1 introduces the topic of data management. It gives a contextual view of what data management is, how it’s changing, and how it affects our digital transformation. It provides an assessment of the state of the field in recent years and guidance for working out a data strategy. In Chapter 2, we’ll jump into the details of managing data at large, exploring domain-driven design and business architecture as methodologies for managing a large data landscape using data domains. Next, Chapter 3 focuses on topologies and data landing zones as a way of structuring your data architecture and aligning with your data domains. The following chapters discuss the specifics of distributing data. Chapter 4 focuses on data products, Command Query Responsibility Segregation (CQRS), and guid‐ ing principles, and presents an example solution design. Chapter 5 discusses API management, and Chapter 6 covers event and notification management. Chapter 7 brings it all together for a comprehensive overview, complemented with architecture guidance and experience. Next, we delve deeper into more advanced aspects of data management. Chapter 8 examines how to approach data governance and security in ways that are practical and sustainable for the long term, even in rapidly changing times. Chapter 9 is a deep dive into the use, significance, and democratizing potential of metadata. Chapter 10 offers guidance on using master data management (MDM) to keep data consistent over distributed, wide-ranging assets, while Chapter 11 addresses turning data into value. Chapter 12 concludes the book with an example of making it real and a vision for the future of data management and enterprise architecture. Preface | xvii

Page 20

Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit http://oreilly.com. xviii | Preface

Data Management at Scale Modern Data Architecture with Data Mesh and Data Fabric - 2nd Edition (Piethein Strengholt)（Z-Library）

Data Management at Scale Modern Data Architecture with Data Mesh and Data Fabric - 2nd Edition (Piethein Strengholt)（Z-Library）

Text Preview (First 20 pages)

Registered users can read the full content for free

Support Author

Recommended for You