Building Data Integration Solutions Unifying Data for Enhanced Decision Making (Jay Borthen) (z-library.sk, 1lib.sk, z-lib.sk)

Author: Jay Borthen

历史

Are you struggling to manage and make sense of the vast streams of data flowing into your organization? In today's data-driven world, the ability to effectively unify and organize disparate data sources is not just an advantage—it's a necessity. The challenge lies in navigating the complexities of data diversity, volume, and regulatory demands, which can overwhelm even the most seasoned data professionals. In this essential book, Jay Borthen offers a comprehensive guide to understanding the art of data integration. This book dives deep into the processes and strategies necessary for creating effective data pipelines that ensure consistency, accuracy, and accessibility of your data. Whether you're a novice looking to understand the basics or an experienced professional aiming to refine your skills, Borthen's insights and practical advice, grounded in real-world case studies, will empower you to transform your organization's data handling capabilities. Understand various data integration solutions and how different technologies can be employed Gain insights into the relationship between data integration and the overall data life cycle Learn to effectively design, set up, and manage data integration components within pipelines Acquire the knowledge to configure pipelines, perform data migrations, transformations, and more

📄 File Format: PDF
💾 File Size: 18.1 MB
16
Views
0
Downloads
0.00
Total Donations

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

📄 Page 1
Jay Borthen Building Data Integration Solutions Unifying Data for Enhanced Decision Making
📄 Page 2
ISBN: 978-1-098-17306-7 US $79.99 CAN $99.99 DATA Are you struggling to manage and make sense of the vast streams of data flowing into your organization? In today’s data-driven world, the ability to effectively unify and organize disparate data sources is not just an advantage—it’s a necessity. The challenge lies in navigating the complexities of data diversity, volume, and regulatory demands, which can overwhelm even the most seasoned data professionals. In this essential book, Jay Borthen offers a comprehensive guide to understanding the art of data integration. This book dives deep into the processes and strategies necessary for creating effective data pipelines that ensure consistency, accuracy, and accessibility of your data. Whether you’re a novice looking to understand the basics or an experienced professional aiming to refine your skills, Borthen’s insights and practical advice, grounded in real-world case studies, will empower you to transform your organization’s data handling capabilities. • Understand various data integration solutions and how different technologies can be employed • Gain insights into the relationship between data integration and the overall data life cycle • Learn to effectively design, set up, and manage data integration components within pipelines • Acquire the knowledge to configure pipelines, perform data migrations, transformations, and more Jay Borthen is head of data science and engineering at Swish Data and has nearly 20 years of experience leading technical teams and delivering data solutions for clients including the IRS, FDA, and US Navy. Building Data Integration Solutions “By combining clear explanations of foundational concepts with a pragmatic, hands-on guide to implementation, this book is both approachable for practitioners and authoritative in scope.” Chris Whitlock, coauthor of Winning the National Security AI Competition “An essential blueprint for architects, this book details how to build a robust data solution by leveraging key AWS cloud services and embracing hands-on integration.” Matthew Martz, software architect and AWS community builder “An invaluable resource for unlocking the potential of data integration, this book provides proven strategies for navigating enterprise challenges.” Sean Applegate, Chief Technology Officer, Swish Data
📄 Page 3
Jay Borthen Building Data Integration Solutions Unifying Data for Enhanced Decision Making
📄 Page 4
978-1-098-17306-7 [LSI] Building Data Integration Solutions by Jay Borthen Copyright © 2026 Jay Borthen. All rights reserved. Published by O’Reilly Media, Inc., 141 Stony Circle, Suite 195, Santa Rosa, CA 95401. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (https://oreilly.com). For more information, contact our corporate/institu‐ tional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Andy Kwan Development Editor: Jeff Bleiel Production Editor: Elizabeth Faerm Copyeditor: Vanessa Moore Proofreader: Arthur Johnson Indexer: nSight, Inc. Cover Designer: Susan Brown Cover Illustrator: José Marzan Jr. Interior Designer: David Futato Interior Illustrator: Kate Dullea November 2025: First Edition Revision History for the First Edition 2025-10-28: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098173067 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Building Data Integration Solutions, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
📄 Page 5
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Part I. Foundations of Data Integration 1. Introduction to Data Integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Data Integration and Data Management 3 Defining Data Integration 5 Why Data Integration Is Important 6 The Evolution of Data Integration 7 Data Integration Use Cases and Case Studies 8 Healthcare 9 Tax Administration 10 Immigration and Border Control 10 Conclusion 10 2. Key Concepts in Data Integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Data Properties 13 Data Types 14 Data Structure Types 14 Metadata 15 Data Orientation 15 Encodings 16 File Formats 18 Data Context 21 Data Stores 22 Types of Storage 23 Data Models and Management Systems 27 iii
📄 Page 6
Hybrid and Multicloud Storage 34 Data Movement and Transformation 34 Connectors and Connections 35 Migration 36 Ingestion 37 Replication 38 Batches, Streams, and Events 38 Pipelines 42 Conditioning 44 Change Data Capture 45 Integration Management 46 Data Services 47 Data Orchestration 47 Conclusion 48 3. Data Integration Challenges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Organizational Issues 49 Technical Challenges 52 Data Quality 52 Data Processing 53 Security and Compliance 56 Conclusion 56 4. Models, Architectures, Methods, and Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Models 58 Conceptual Data Integration Models 58 Logical Data Integration Models 59 Physical Data Integration Methods 60 Architectures 61 Hub-and-Spoke 61 Point-to-Point 62 Enterprise Service Bus 63 Federation 64 Methods 65 Patterns 66 Ingestion Patterns 66 Data Consolidation Pattern 69 Data Replication and Propagation Pattern 70 Data Virtualization Pattern 70 Event-Driven Integration Pattern 71 Conclusion 72 iv | Table of Contents
📄 Page 7
Part II. Tools, Technologies, and Frameworks 5. Data Integration Tool Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Open Source Versus Commercial Solutions 77 Advantages of Open Source Solutions 78 Advantages of Commercial Solutions 78 Programming Languages Versus Low-Code/No-Code Platforms 79 Cloud Versus On-Premises Architectures 80 On-Premises Considerations 81 Cloud Service Providers 82 Distributed Versus Centralized Data Systems 83 In-Memory Processing 84 Security and Compliance 85 Conclusion 86 6. Data Stores and Management Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Relational Databases 88 IBM Db2 88 Microsoft SQL Server 89 MySQL and MariaDB 90 Oracle Database 91 PostgreSQL 92 SQLite 92 Sybase and SAP 93 Non-Relational Databases 93 Document Stores and Key-Value Storage 94 Graph Databases 96 Vector Databases 98 Wide-Column Databases 101 Data Warehouses 103 Amazon Redshift 103 Apache Doris, Druid, Hadoop, and Hive 104 Cloudera Data Warehouse 106 IBM Db2 Warehouse 106 Snowflake 107 Data Lakes and Lakehouses 107 Amazon Simple Storage Service 107 Apache Hudi and Iceberg 108 Azure Blob Storage 109 Delta Lake 109 Table of Contents | v
📄 Page 8
Google Cloud Storage 111 IBM Cloud Storage Services 111 Conclusion 111 7. Data Ingestion and Streaming Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Apache Beam, Flink, Spark, and Storm 113 Apache NiFi 116 AWS Glue and Amazon Kinesis 117 Azure Event Hubs 119 Confluent and Kafka 119 Conclusion 121 8. Comprehensive Integration Suites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 AWS Glue, Amazon Elastic MapReduce, and Amazon Q 124 Azure Data Factory 125 Databricks 126 Fivetran 128 IBM DataStage and App Connect 129 IICS and PowerCenter 129 Microsoft SQL Server Integration Services 130 MuleSoft 130 Oracle Data Integrator and GoldenGate 131 Pentaho 131 Qlik, Talend, and Stitch 131 TIBCO 133 Conclusion 134 Part III. Introducing the Example Data Integration Solution 9. Introducing the Example Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Objectives 137 Initial State 138 Planned Architecture 139 Conclusion 140 10. Implementing a Batch Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Setting Up Qlik Replicate 141 Setting Up a Windows Server EC2 Instance for Qlik Replicate 141 Installing and Downloading Qlik Replicate 148 Setting Up Endpoint Connections 152 vi | Table of Contents
📄 Page 9
Setting Up Databricks 157 Setting Up Databricks in AWS 157 Connecting Databricks 163 Conclusion 172 11. Implementing a Streaming Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Raspberry Pi and Sensor Setup 173 Bill of Materials 174 Sensor Configuration 175 Creating a Confluent Cloud Cluster 176 Creating a Local Python Environment 179 Cluster Settings 179 Creating a Topic 182 Configuring a Client 185 Creating the Python Producer and Consumer Applications 189 Setting Up a Connector 193 Conclusion 201 A. Setting Up the Data Integration Solution Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 B. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Key Terms Glossary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Acronyms Glossary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Table of Contents | vii
📄 Page 10
(This page has no text content)
📄 Page 11
1 O’Reilly Media has some great resources to explore Linux, particularly its Linux Sandbox. Preface This book presents a pragmatic, hands-on approach to data integration by first baselining the reader’s knowledge with important terminology and concepts and eventually walking through the building of a plausible, real-life data integration solu‐ tion, step by step. For the hands-on parts in the later chapters of the book, familiarity with Linux,1 Python, Structured Query Language (SQL), and Amazon Web Services (AWS) would be beneficial, but I’ll attempt to explain what takes place at each step in simple terms. The combinations of tools and techniques that are described in this book are almost surely not “the best” for your specific use case. There are far too many variables and trade-offs to consider for an adequate presentation of all possible solutions. However, many of the technologies that are discussed are considered dominant players by some of the leading advisory and consultancy firms and have a significant presence within the US federal government. In this book, I prioritize the tools and technologies that meet current government mandates such as HIPAA and FedRAMP (see “Security and Compliance” on page 85 for a more in-depth discussion regarding government regulations). It should be noted that containerization is not utilized in the hands-on sections of this book. However, it may be prudent for the practitioner to use containers (e.g., Docker) and perhaps even a distributed container management tool like Kubernetes for large, enterprise data integration initiatives. Also, while the book focuses on aspects directly related to data integration, because of inherent complexity within data engineering and data management, topic tangents and parallel subject matter discussions are inevitable. There are also many concepts included in the book that live within the blurred lines between data engineering and software development. ix
📄 Page 12
Further, my intention is not to delve too deeply into any single concept but rather to brush the surface enough to understand where the concepts are applicable within a data integration solution and to assist with the hands-on integrations in later chapters. Overview of the Book Structure and What Readers Can Expect to Learn In the first part of the book, readers will explore the foundational principles of data integration and its role in modern data management. I explain the importance of data integration methods in unifying and organizing diverse data sources to ensure data accuracy, accessibility, and consistency. Part I, “Foundations of Data Integration”, focuses on key concepts and processes involved in data integration and its connection to related subjects such as data analytics and data governance. It also covers essential topics and terminology helpful for data engineers including data properties, structures, types, and encodings. It emphasizes the classification of data into structured, unstructured, and semistructured categories and the significance of understanding these classifications for effective implementation of data integration solutions. The second half of Part I delves into the challenges and limitations of data integra‐ tion, particularly the difficulties in incorporating data from legacy systems and adapt‐ ing to diverse, rapidly changing data sources. It also examines organizational issues, including the impact of policies and data governance and management practices. Part I lays a solid foundation for understanding the critical aspects of data integra‐ tion, setting the stage for exploring tools, technologies, and practical implementation strategies in subsequent parts of the book. Part II, “Tools, Technologies, and Frameworks”, gives an in-depth examination of various data integration software tools and technologies. It starts by describing many of the options available in regards to the different tools and technologies, such as open source versus commercial tools. There is also a hardware aspect to data integration, but that is not the focus of this book. Open source solutions are praised for being cost-effective, flexible, and supported by active communities, making them ideal for organizations with skilled technical staff and limited budgets. Conversely, commercial tools are noted for their user- friendliness, dedicated customer support, and advanced features such as low-code x | Preface
📄 Page 13
and no-code interfaces, as well as typically strong security measures, which help minimize operational risks and ensure compliance. The section then delves into the growing popularity of low-code and no-code platforms. These platforms enable nontechnical users to perform data integration tasks through intuitive interfaces and prebuilt connectors, significantly speeding up integration processes and reducing reliance on technical teams. The book outlines how these platforms can democratize data access and simplify complex data work‐ flows, making data integration more accessible to a broader range of users within organizations. A detailed comparison of cloud and on-premises integration solutions follows, dis‐ cussing their respective advantages and drawbacks. Cloud integration solutions are recognized for their scalability, flexibility, and cost-effectiveness, but they also raise concerns regarding data security, compliance, and vendor lock-in. On-premises solu‐ tions, while offering enhanced control and compliance, are criticized for being less scalable and more expensive to maintain. The book further explores the capabilities of major cloud service providers such as AWS and Microsoft Azure. It evaluates their unique strengths and ideal use cases, such as the extensive ecosystem and global reach offered by AWS, and Azure’s seam‐ less integration with Microsoft products. This comparative analysis helps readers understand which cloud services might best suit their specific data integration needs. Part II provides readers with a balanced understanding of various data integration tools and technologies, along with their benefits, limitations, and best use cases. By exploring both traditional and modern solutions, the section equips readers with the knowledge to make informed decisions on the right tools and platforms for their organizational requirements. This knowledge is invaluable for those looking to navigate the complex landscape of data integration technology. Part III, “Introducing the Example Data Integration Solution”, presents a comprehen‐ sive guide to setting up a data integration solution, providing a practical example infrastructure that highlights key components, configurations, and tools necessary for seamless dataflow across various systems. It starts with the foundational elements of infrastructure setup for data integration projects, including database selection, cloud services, and network considerations, emphasizing scalability and maintainability. The section introduces publicly available datasets from the International Energy Agency (IEA) and the US Energy Information Administration (EIA). These datasets are used in the example integration solution. The architecture employs an Amazon EC2 instance, hosting Qlik Replicate, and showca‐ ses a hybrid deployment using both Linux and Windows Server environments. Security is mentioned as an essential aspect, though the primary focus remains on the technical components and configuration of data integration technologies. The part delves into Preface | xi
📄 Page 14
practical implementation with AWS, Confluent Kafka, Databricks, and Qlik. It provides step-by-step instructions for setting up Qlik tools alongside an integration of Databricks for unified data analytics. Confluent Kafka is introduced for streamlining event-driven data pipelines, with examples of configurations on Ubuntu Linux. This part serves as a practical resource for setting up a robust, scalable data integra‐ tion infrastructure, enabling organizations to unify and optimize their data pipelines for better analytics and decision making. It balances technical detail with actionable insights, offering a blueprint adaptable to various organizational needs. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution. xii | Preface
📄 Page 15
Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/jlb226/building_data_integration_solutions. If you have a technical question or a problem using the code examples, please send email to support@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Building Data Integration Solutions by Jay Borthen (O’Reilly). Copyright 2026 Jay Borthen, 978-1-098-17306-7.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. Preface | xiii
📄 Page 16
How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 141 Stony Circle, Suite 195 Santa Rosa, CA 95401 800-889-8969 (in the United States or Canada) 707-827-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://oreilly.com/about/contact.html We have a web page for this book, where we list errata and any additional informa‐ tion. You can access this page at https://oreil.ly/building-data-integration-solutions. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media. Watch us on YouTube: https://youtube.com/oreillymedia. Acknowledgments I owe a special debt to Sean Applegate, Bharath Chandra Memmadisetty, Aarohi Tripathi, and Mark Zalubus for their clear-eyed reviews, technical depth, and practi‐ cal advice. To Cara, my wife—thank you for the patience and perspective that made the writing possible. My sincere thanks to the O’Reilly editorial and production team for their guidance from concept to copyedit. Your clarity, rigor, and craftsmanship raised the quality of this book at every step. xiv | Preface
📄 Page 17
PART I Foundations of Data Integration Chapter 1, “Introduction to Data Integration”, delves into the fundamentals of data integration, exploring its critical functions within the data life cycle and its alignment with broader organizational goals. By unifying and organizing diverse data sources, data integration ensures that data is accurate, accessible, and consistent, ultimately enhancing decision-making processes. The chapter outlines key concepts and pro‐ cesses involved in data integration and illustrates its importance in transforming raw data into valuable insights and driving business efficiency. Additionally, it provides an overview of related fields, such as data analytics and data governance, and emphasizes the interconnected nature of these disciplines within a robust data management framework. Chapter 2, “Key Concepts in Data Integration”, introduces key concepts of data integration that form the foundation of effective data management strategies and encompasses a variety of terms and practices essential for data engineers. The chapter highlights the importance of understanding and correctly applying terms related to data properties, data structures, data types, and encodings. The chapter covers the classification of data into structured, unstructured, and semistructured categories and explains their unique characteristics and relevance in the data ecosystem. Addition‐ ally, Chapter 2 covers data file formats, metadata, and the context of data usage and how these elements play critical roles in data integration processes. Establishing a clear understanding of these fundamental concepts will help data engineers better architect and implement durable data integration solutions tailored to organizational needs.
📄 Page 18
Chapter 3, “Data Integration Challenges”, addresses the common obstacles and lim‐ itations that organizations face when attempting to integrate data from multiple sources. We will examine technical, data, and organizational challenges and provide insights into why data integration can be a complex and resource-intensive process. The chapter explores strategies for overcoming these barriers to ensure successful data integration projects. The chapter also addresses the technical complexities associated with various data formats, protocols, and standards alongside the challenges posed by ensuring data quality, consistency, and scalability. By identifying and understanding these chal‐ lenges, organizations can better navigate the intricate landscape of data integration to maximize the value derived from their data assets. Chapter 4, “Models, Architectures, Methods, and Patterns”, aims to clarify data inte‐ gration concepts by exploring foundational elements including models, architectures, methods, and patterns. Each of these components plays a crucial role in shaping how data flows between systems and how efficiency, consistency, and scalability are maintained. Together, the chapters in Part I of this book provide a solid foundation for under‐ standing important aspects of data integration and set the stage for exploring the tools, technologies, frameworks, and practical implementation strategies covered in Parts II and III.
📄 Page 19
1 See “Pipelines” on page 42 for a discussion on data pipelines. CHAPTER 1 Introduction to Data Integration This chapter provides an overview of what data integration actually is, what role it plays in the overall data life cycle, and how it relates to an organization’s data strategy. It aims to equip you with the basic understanding necessary to effectively implement a data integration solution and align the solution to broader organizational objectives. Data Integration and Data Management Data life cycle management encompasses all the disciplines related to obtaining and maintaining value from data. Effective data management ensures that data is accurate, available, and accessible and is a primary component in the decision-making process. You may have heard the term DataOps, which is a style of data management that focuses on collaboration between stakeholders throughout the data life cycle, much the same way that DevOps is centered around collaboration between software devel‐ opment teams. DataOps emphasizes automation, quality, and continuous delivery in data processes, similar to DevOps in software development. I prefer to partition the management of the data life cycle into three segments. As you can see in Figure 1-1, the segments include data integration, data analytics, and data governance. Each has distinct objectives and consists of lower-level processes that combine to form data pipelines.1 The lower-level processes sometime live in the gray area between the components. Data analytics and data governance are no less important than data integration is to the overall data life cycle. Let’s begin with brief descriptions of data analytics and data governance. 3
📄 Page 20
2 I generally consider data visualization and business intelligence (BI) to be elements within data analytics, but I sometimes see them grouped with components of data integration. Figure 1-1. Components of data life cycle management Data analytics consists of all the tasks you would expect a typical data scientist or data analyst to perform, from creating data visualizations2 to developing machine learning (ML) models. You could consider analytics to be the frontend of data management. It is the component of data life cycle management that is typically most familiar to the end users and decision makers. Like a web browser is to the internet, data analytics (and, in partic‐ ular, data visualization) is to data life cycle management. Data governance, on the other hand, refers to the policies, standards, and practices that ensure data is handled properly throughout the data life cycle while simulta‐ neously aligning with organizational objectives and risk management strategies. It involves all activities related to enforcing data integrity, privacy, compliance with regulations, and maintaining authority and control over the management of data assets. For example, the US Department of Defense (DOD) published seven data governance goals they must achieve to become data-centric. The data must be visible, accessible, understandable, linked, trustworthy, interoperable, and secure. These goals are collectively known as VAULTIS, and, although they do not speak to specific laws or regulations, they are designed to ensure that the data is suitable to literally, and figuratively, represent the DOD. In the parlance of data, life cycle does not necessarily imply a cycle. 4 | Chapter 1: Introduction to Data Integration
The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00
Total Amount (¥)
0
Donation Count

Login to support the author

Login Now
Back to List