Learning OpenTelemetry (Ted Young, Austin Parker) (Z-Library)

Author: Ted Young, Austin Parker

科学

OpenTelemetry is a revolution in observability data. Instead of running multiple uncoordinated pipelines, OpenTelemetry provides users with a single integrated stream of data, providing multiple sources of high-quality telemetry data: tracing, metrics, logs, RUM, eBPF, and more. This practical guide shows you how to set up, operate, and troubleshoot the OpenTelemetry observability system. Authors Austin Parker, head of developer relations at Lightstep and OpenTelemetry Community Maintainer, and Ted Young, cofounder of the OpenTelemetry project, cover every OpenTelemetry component, as well as observability best practices for many popular cloud, platform, and data services such as Kubernetes and AWS Lambda. You'll learn how OpenTelemetry enables OSS libraries

📄 File Format: PDF

💾 File Size: 4.8 MB

Views

Downloads

0.00

Total Donations

📖 Read Online ⬇️ Download

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

📄 Page 1

(This page has no text content)

📄 Page 2

Learning OpenTelemetry Setting Up and Operating a Modern Observability System Ted Young and Austin Parker

📄 Page 3

Learning OpenTelemetry by Ted Young and Austin Parker Copyright © 2024 Ted Young and Austin Parker. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800- 998-9938 or corporate@oreilly.com. Acquisitions Editor: John Devins Development Editor: Sarah Grey Production Editor: Aleeya Rahman Copyeditor: Arthur Johnson Proofreader: Sharon Wilkey Indexer: nSight, Inc. Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea March 2024: First Edition Revision History for the First Edition

📄 Page 4

2024-03-05: First Release See http://oreilly.com/catalog/errata.csp? isbn=9781098147181 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Learning OpenTelemetry, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-098-14718-1 [LSI]

📄 Page 5

Dedication For Dylan Mae —Austin   For the OpenTelemetry cofounders —Ted

📄 Page 6

Foreword In the ever-evolving landscape of cloud-native technologies, observing an application’s performance and health is no longer a luxury, but a critical imperative. As microservice architectures become the norm, distributed systems sprawl, and data volumes explode, traditional monitoring tools struggle to keep pace. This is where OpenTelemetry emerges as a game changer, offering a standardized and vendor-neutral approach to observability. OpenTelemetry is not just a technology; it also represents a paradigm shift, “crossing the chasm” from just monitoring to complete observability. OpenTelemetry is changing the industry from silos to unified telemetry. As the authors Ted Young and Austin Parker explain, OpenTelemetry is about embracing a unified telemetry data-driven approach to observability, leveraging open standards like the OpenTelemetry Protocol (OTLP), and being empowered to build and operate fully observable, resilient, high-performing cloud-native applications. Learning OpenTelemetry serves as your comprehensive guide to unlocking the power of OpenTelemetry. Whether you’re a seasoned engineer grappling with the complexities of distributed tracing, a newcomer seeking to understand the fundamentals, or an organization embarking on its observability journey, this book equips you with the knowledge and practical insights to navigate this transformative technology. The authors emphasize that observability requires understanding the cloud-native paradigm’s broader context and inherent challenges. For example, microservice architectures, while offering agility and scalability, introduce new complexities. Traditional monitoring tools,

📄 Page 7

designed for monolithic applications, often struggle to capture the intricate interactions and dependencies between services. This lack of coherent visibility leads to visibility gaps, making it difficult to pinpoint performance bottlenecks, diagnose issues, and ensure application health. Learning OpenTelemetry highlights how OpenTelemetry addresses these challenges head-on by providing a unified and vendor-agnostic approach to collecting and exporting telemetry data. This unified approach uses metrics, traces, logs, and profiling to offer a correlated view of your application’s health and performance. The authors delve into the intricacies of OpenTelemetry, guiding us through core concepts of OpenTelemetry and pursuing instrumentation strategies for different programming languages and frameworks such as shared libraries and shared services. They illuminate best practices for collecting and processing telemetry data using the OpenTelemetry Collector; they survey deployment patterns for scaling telemetry collection for platforms such as Kubernetes, serverless and data streaming. They’ll show you how to build scalable telemetry pipelines by balancing wide approaches with deep ones, centralized architectures with decentralized ones, and more. The final chapter explores advanced topics, such as generative AI, FinOps, and cloud sustainability. We live in exciting times. As the worlds of cloud-native services and AI applications converge, it’s critical to use telemetry data to understand large-scale model behavior. That’s why the next giant leap in OpenTelemetry’s journey will be to provide an open framework to fully support observability for smart, distributed GenAI applications. Observability, as a practice, must incorporate viable AI models to collect and analyze telemetry at massive scale.

📄 Page 8

So, open this book, dive into the world of OpenTelemetry, and unlock the power of observability for your cloud-native journey. Remember, the path to mastery starts with a single step, and this book is a guide to your first and following steps in that journey. Enjoy! Alolita Sharma Palo Alto, California February 2024   Alolita Sharma is an OpenTelemetry Governance Committee member and has been contributing to the OpenTelemetry project for over five years. She is co-chair for the CNCF Observability Technical Advisory Group (TAG) and leads Apple’s AIML observability practice. She contributes to open source and open standards in OpenTelemetry, Observability TAG, Unicode and W3C. Alolita has also provided strong leadership for observability, infrastructure, and search engineering at AWS and has managed engineering teams at IBM, PayPal, Twitter, and Wikipedia.

📄 Page 9

Preface Over the past decade, observability has gone from a niche discipline talked about at events like Monitorama or Velocity (RIP) to a multibillion-dollar industry that touches every part of the cloud native world. The key to effective observability, though, is high-quality telemetry data. OpenTelemetry is a project that aims to provide this data and, in doing so, kick off the next generation of observability tools and practices. If you’re reading this book, it’s highly likely that you’re an observability practitioner—perhaps a developer or an SRE —who is interested in how to profile and understand complex systems in production. You may have picked it up because you’re interested in what OpenTelemetry is, how it fits together, and what makes it different from historical monitoring frameworks. Or maybe you’re just trying to understand what all the hype is about. After all, in just five years, OpenTelemetry has gone from an idea to one of the most popular open source projects in the world. Regardless of why you’re here, we’re glad you came. Our goal in writing this book was not to create a “missing manual” for OpenTelemetry—you can find lots of documentation and tutorials and several other fantastic books that dive deep into implementing OpenTelemetry in specific languages (see Appendix B for details on those). Our goal was to present a comprehensive guide to learning OpenTelemetry itself. We want you to understand not just what the different parts are but how they fit together and why. This book should equip you with the foundational knowledge you’ll need not only to implement

📄 Page 10

OpenTelemetry in a production system but also to extend OpenTelemetry itself—either as a contributor to the project or by making it part of an organizational observability strategy. In general, this book has two main parts. In Chapters 1 through 4, we discuss the current state of monitoring and observability and show you the motivation behind OpenTelemetry. These chapters help you understand the foundational concepts that underpin the entire project. They’re invaluable not just for first-time readers but for anyone who’s been practicing observability for a while. Chapters 5 through 9 move into specific use cases and implementation strategies. We discuss the “how” behind the concepts introduced in earlier chapters and give you pointers on actually implementing OpenTelemetry in a variety of applications and scenarios. If you’re already well versed in observability topics, you might be considering skipping ahead to the latter part of the book. While we can’t stop you, you’ll probably get something out of reviewing the initial chapters. Regardless, as long as you go into this book with an open mind, you should get something out of it—and keep coming back, time after time. We hope this book becomes the foundation for the next chapter of your observability journey. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions.

📄 Page 11

Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values determined by context. TIP This element signifies a tip or suggestion. NOTE This element signifies a general note. WARNING This element indicates a warning or caution. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/orgs/learning-

📄 Page 12

opentelemetry-oreilly/. If you have a technical question or a problem using the code examples, please send email to support@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Learning OpenTelemetry by Ted Young and Austin Parker (O’Reilly). Copyright 2024 Austin Parker and Ted Young, 9781098147181.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. O’Reilly Online Learning NOTE For more than 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed.

📄 Page 13

Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in- depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-889-8969 (in the United States or Canada) 707-827-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://www.oreilly.com/about/contact.html We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/learning-opentelemetry.

📄 Page 14

For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly- media Watch us on YouTube: https://youtube.com/oreillymedia Acknowledgments The authors would like to thank their entire team at O’Reilly for constant support, encouragement, and grace. Special thanks go out to our acquisitions editor, John Devins, and our developmental editor, Sarah Grey. We also thank our tech reviewers for their invaluable feedback, as well as Alolita Sharma for her contribution. This book would not be possible without the work of every OpenTelemetry contributor across the years. Austin I’d like to thank my coauthor for convincing me that writing another book would be a good idea, actually. To my partner, Mandy: thank you for putting up with the long hours and unpredictable nature of writing. Tada gan iarracht.1 I would also like to thank the many people whom I used as sounding boards over the past year or so, and whose friendship and ideas have made it into these words; they include (but are not limited to) Phillip Carter, Alex Hidalgo, Jessica Kerr, Reese Lee, Rynn Mancuso, Ana Margarita Medina, Ben Sigelman, Pierre Tessier, Amy Tobey, Adriana Villela, Hazel Weakly, and Christine Yen. All y’all great humans.

📄 Page 15

Ted I’d like to thank my coauthor for being convinced that writing another book would be a good idea, actually. I would like to thank all the maintainers of the OpenTracing and OpenCensus projects. Both projects have the same goal: to create a universal standard for describing the computer operations of distributed systems. Choosing to put egos aside, merge the projects, and accept a years-long setback as we started over with OpenTelemetry was a difficult decision. I appreciate the bravery and trust that it took to do this. I would also like to thank the maintainers of the Elastic Common Schema project. This was another case in which having two standards meant that we had no standards. Their willingness to merge ECS into the OpenTelemetry Semantic Conventions was another important step toward our shared goal of a universally accepted telemetry system. It’s a common (and funny) joke to point at OpenTelemetry and bring up the classic XKCD comic #927, “How Standards Proliferate”. But I must say, au contraire, Monsieur chuckles! OpenTelemetry did create a new standard, but in the process it deprecated three other standards. So we are now at minus two standards. I believe this may be a record in the history of standardization. I’m hoping for at least minus four before we’re done. 1 “Nothing without effort.”

📄 Page 16

Chapter 1. The State of Modern Observability History is not the past but a map of the past, drawn from a particular point of view, to be useful to the modern traveler. —Henry Glassie, US historian1 This is a book about the difficult problems inherent to large-scale distributed computer systems, and about how to apply OpenTelemetry to help solve those problems. Modern software engineering is obsessed with end-user experience, and end users demand blazing-fast performance. Surveys show that users will abandon ecommerce sites that take more than two seconds to load. You’ve probably spent a fair amount of time trying to optimize and debug application performance issues, and if you’re like us, you’ve been frustrated by how inelegant and inefficient this process can be. There’s either not enough data or too much of it, and what data there is can be riddled with inconsistencies or unclear measurements. Engineers are also faced with stringent uptime requirements. That means identifying and mitigating any issues before they cause a meltdown, not just waiting for the system to fail. And it means moving quickly from triage to mitigation. To do that, you need data. But you don’t need just any data; you need correlated data —data that is already organized, ready to be analyzed by a computer system. As you will see, data with that level of organization has not been readily available. In fact, as

📄 Page 17

systems have scaled and become more heterogeneous, finding the data you need to analyze an issue has become even harder. If it was once like looking for a needle in a haystack, it’s now more like looking for a needle in a stack of needles. OpenTelemetry solves this problem. By turning individual logs, metrics, and traces into a coherent, unified graph of information, OpenTelemetry sets the stage for the next generation of observability tools. And since the software industry is broadly adopting OpenTelemetry already, that next generation of tools is being built as we write this. The Times They Are A-Changin’ Technology comes in waves. As we write this in 2024, the field of observability is riding its first real tsunami in at least 30 years. You’ve chosen a good time to pick up this book and gain a new perspective! The advent of cloud computing and cloud native application systems has led to seismic shifts in the practice of building and operating complex software systems. What hasn’t changed, though, is that software runs on computers, and you need to understand what those computers are doing in order to understand your software. As much as the cloud has sought to abstract away fundamental units of computing, our ones and zeros are still using bits and bytes. Whether you are running a program on a multiregion Kubernetes cluster or a laptop, you will find yourself asking the same questions: “Why is it slow?”

📄 Page 18

“What is using so much RAM?” “When did this problem start?” “Where is the root cause?” “How do I fix this?” The astronomer and science communicator Carl Sagan said, “You have to know the past to understand the present.”2 That certainly applies here: to see why a new approach to observability is so important, you first need to be familiar with traditional observability architecture and its limitations. This may look like a recap of rudimentary information! But the observability mess has been around for so long that most of us have developed quite the pile of preconceptions. So even if you’re an expert—especially if you’re an expert— it is important to have a fresh perspective. Let’s start this journey by defining several key terms we will use throughout this book. Observability: Key Terms to Know First of all, what is observability observing? For the purposes of this book, we are observing distributed systems. A distributed system is a system whose components are located on different networked computers that communicate and coordinate their actions by passing messages to one another.3 There are many kinds of computer systems, but these are the ones we’re focusing on.

📄 Page 19

WHAT COUNTS AS DISTRIBUTED? Distributed systems aren’t just applications running in the cloud, microservices, or Kubernetes applications. Macroservices or “monoliths” that use service-oriented architecture, client applications that communicate with a backend, and mobile and web apps are all somewhat distributed and benefit from observability. At the highest level, a distributed system consists of resources and transactions: Resources These are all the physical and logical components that make up a system. Physical components, such as servers, containers, processes, RAM, CPU, and network cards, are all resources. Logical components, such as clients, applications, API endpoints, databases, and load balancers, are also resources. In short, resources are everything from which the system is actually constructed. Transactions These are requests that orchestrate and utilize the resources the system needs to do work on behalf of the user. Usually, a transaction is kicked off by a real human, who is waiting for the task to be completed. Booking a flight, hailing a rideshare, and loading a web page are examples of transactions. How do we observe these distributed systems? We can’t, unless they emit telemetry. Telemetry is data that describes what your system is doing. Without telemetry, your system is just a big black box filled with mystery. Many developers find the word telemetry confusing. It’s an overloaded term. The distinction we draw in this book, and

📄 Page 20

in systems monitoring in general, is between user telemetry and performance telemetry: User telemetry Refers to data about how a user is interacting with a system through a client: button clicks, session duration, information about the client’s host machine, and so forth. You can use this data to understand how users are interacting with an ecommerce site, or the distribution of browser versions accessing a web-based application. Performance telemetry This is not primarily used to analyze user behavior; instead, it provides operators with statistical information about the behavior and performance of system components. Performance data can come from different sources in a distributed system and offers developers a “breadcrumb trail” to follow, connecting cause with effect. In plainer terms, user telemetry will tell you how long someone hovered their mouse cursor over a Checkout button in an ecommerce application. Performance telemetry will tell you how long it took for that checkout button to load in the first place, and which programs and resources the system utilized along the way. Underneath user and performance telemetry are different types of signals. A signal is a particular form of telemetry. Event logs are one kind of signal. System metrics are another kind of signal. Continuous profiling is another. These signal types each serve a different purpose, and they are not really interchangeable. You can’t derive all the events that make up a user interaction just by looking at system metrics, and you can’t derive system load just by

The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00

Total Amount (¥)

Donation Count

← Back to List