📄 Page
1
Thomas Hunter II Distributed Systems with Node.js Building Enterprise-Ready Backend Services
📄 Page
2
(This page has no text content)
📄 Page
3
Thomas Hunter II Distributed Systems with Node.js Building Enterprise-Ready Backend Services Boston Farnham Sebastopol TokyoBeijing
📄 Page
4
978-1-492-07729-9 [LSI] Distributed Systems with Node.js by Thomas Hunter II Copyright © 2021 Thomas Hunter II. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Jennifer Pollock Development Editor: Corbin Collins Production Editor: Daniel Elfanbaum Copyeditor: Piper Editorial LLC Proofreader: Piper Editorial LLC Indexer: nSight Inc. Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea November 2020: First Edition Revision History for the First Edition 2020-11-03: First Release 2020-11-12: Second Release 2021-01-29: Third Release See https://www.oreilly.com/catalog/errata.csp?isbn=9781492077299 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Distributed Systems with Node.js, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
📄 Page
5
This book is dedicated to my mother.
📄 Page
6
(This page has no text content)
📄 Page
7
Table of Contents Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1. Why Distributed?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 The Single-Threaded Nature of JavaScript 1 Quick Node.js Overview 6 The Node.js Event Loop 9 Event Loop Phases 10 Code Example 11 Event Loop Tips 14 Sample Applications 15 Service Relationship 16 Producer Service 17 Consumer Service 18 2. Protocols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Request and Response with HTTP 22 HTTP Payloads 23 HTTP Semantics 25 HTTP Compression 26 HTTPS / TLS 29 JSON over HTTP 34 The Dangers of Serializing POJOs 35 API Facade with GraphQL 36 GraphQL Schema 37 Queries and Responses 38 v
📄 Page
8
GraphQL Producer 40 GraphQL Consumer 43 RPC with gRPC 45 Protocol Buffers 45 gRPC Producer 48 gRPC Consumer 50 3. Scaling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 The Cluster Module 53 A Simple Example 54 Request Dispatching 57 Cluster Shortcomings 58 Reverse Proxies with HAProxy 61 Introduction to HAProxy 63 Load Balancing and Health Checks 64 Compression 69 TLS Termination 70 Rate Limiting and Back Pressure 72 SLA and Load Testing 75 Introduction to Autocannon 76 Running a Baseline Load Test 76 Reverse Proxy Concerns 80 Protocol Concerns 84 Coming Up with SLOs 87 4. Observability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Environments 92 Logging with ELK 93 Running ELK via Docker 94 Transmitting Logs from Node.js 95 Creating a Kibana Dashboard 98 Running Ad-Hoc Queries 100 Metrics with Graphite, StatsD, and Grafana 102 Running via Docker 103 Transmitting Metrics from Node.js 104 Creating a Grafana Dashboard 106 Node.js Health Indicators 108 Distributed Request Tracing with Zipkin 111 How Does Zipkin Work? 112 Running Zipkin via Docker 115 Transmitting Traces from Node.js 115 vi | Table of Contents
📄 Page
9
Visualizing a Request Tree 118 Visualizing Microservice Dependencies 119 Health Checks 120 Building a Health Check 121 Testing the Health Check 124 Alerting with Cabot 124 Create a Twilio Trial Account 125 Running Cabot via Docker 126 Creating a Health Check 127 5. Containers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Introduction to Docker 133 Containerizing a Node.js Service 140 Dependency Stage 141 Release Stage 143 From Image to Container 146 Rebuilding and Versioning an Image 148 Basic Orchestration with Docker Compose 151 Composing Node.js Services 152 Internal Docker Registry 156 Running the Docker Registry 157 Pushing and Pulling to the Registry 158 Running a Docker Registry UI 160 6. Deployments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Build Pipeline with Travis CI 165 Creating a Basic Project 165 Configuring Travis CI 167 Testing a Pull Request 168 Automated Testing 170 Unit Tests 172 Integration Tests 174 Code Coverage Enforcement 177 Deploying to Heroku 183 Create a Heroku App 184 Configure Travis CI 185 Deploy Your Application 187 Modules, Packages, and SemVer 190 Node.js Modules 191 SemVer (Semantic Versioning) 193 npm Packages and the npm CLI 197 Table of Contents | vii
📄 Page
10
Internal npm Registry 204 Running Verdaccio 205 Configuring npm to Use Verdaccio 205 Publishing to Verdaccio 205 7. Container Orchestration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Introduction to Kubernetes 210 Kubernetes Overview 210 Kubernetes Concepts 211 Starting Kubernetes 214 Getting Started 214 Deploying an Application 219 Kubectl Subcommands 219 Kubectl Configuration Files 222 Service Discovery 226 Modifying Deployments 232 Scaling Application Instances 232 Deploying New Application Versions 233 Rolling Back Application Deployments 235 8. Resilience. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 The Death of a Node.js Process 239 Process Exit 240 Exceptions, Rejections, and Emitted Errors 242 Signals 247 Building Stateless Services 249 Avoiding Memory Leaks 251 Bounded In-Process Caches 252 External Caching with Memcached 256 Introducing Memcached 257 Running Memcached 258 Caching Data with Memcached 259 Data Structure Mutations 260 Database Connection Resilience 262 Running PostgreSQL 262 Automatic Reconnection 263 Connection Pooling 269 Schema Migrations with Knex 272 Configuring Knex 274 Creating a Schema Migration 275 Applying a Migration 276 viii | Table of Contents
📄 Page
11
Rolling Back a Migration 279 Live Migrations 280 Idempotency and Messaging Resilience 284 HTTP Retry Logic 286 Circuit Breaker Pattern 289 Exponential Backoff 289 Resilience Testing 293 Random Crashes 294 Event Loop Pauses 294 Random Failed Async Operations 295 9. Distributed Primitives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 The ID Generation Problem 298 Introduction to Redis 301 Redis Operations 302 Strings 304 Lists 305 Sets 307 Hash 308 Sorted Sets 310 Generic Commands 311 Other Types 312 Seeking Atomicity 313 Transactions 315 Lua Scripting 317 Writing a Lua Script File 318 Loading the Lua Script 320 Tying It All Together 322 10. Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Wrangling Repositories 326 Recognizing Attack Surface 328 Parameter Checking and Deserialization 328 Malicious npm Packages 331 Application Configuration 332 Environment Variables 333 Configuration Files 334 Secrets Management 337 Upgrading Dependencies 339 Automatic Upgrades with GitHub Dependabot 340 Manual Upgrades with npm CLI 342 Table of Contents | ix
📄 Page
12
Unpatched Vulnerabilities 344 Upgrading Node.js 346 Node.js LTS Schedule 346 Upgrade Approach 347 A. Installing HAProxy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 B. Installing Docker. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 C. Installing Minikube & Kubectl. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 x | Table of Contents
📄 Page
13
Foreword In the past decade, Node.js has gone from novelty to the de facto platform for new applications. During that period, I have had the opportunity to help thousands of Node.js developers from around the world orient themselves and find their paths to success. I have seen Node.js used for everything. Really: someone even built a low- level bootable operating system with Node.js. At the SFNode meetup I created in San Francisco, we have a star speaker who has spoken more than anyone else. You guessed it: Thomas Hunter II, the author of this book. While you may be able to do anything with Node.js, there are some really prac‐ tical things that particularly benefit from being done with Node.js. In today’s cloud- first world, most systems have become distributed systems. In this book and in the countless talks I’ve had the pleasure to see Thomas give at SFNode and around the world, pragmatism reigns supreme. This book is filled with experience-tested, hands- on guidance to get you from where you are today to where you need to be tomorrow. The JavaScript language enables us as developers to create at the speed of thought. It requires little ceremony, and the code we write is usually simple enough that writing it by hand is more efficient than generating it. This beautiful simplicity of JavaScript is perfectly matched with Node.js. Node, as we frequently refer to it, is intentionally minimal. Ryan Dahl, its creator, wrote Node to build an application server that was an order of magnitude easier and faster than what anyone was used to. The results have exceeded even our wildest dreams. The ease and simplicity of Node.js enables you to create, validate, and innovate in ways that simply weren’t possible 10 years ago. Before I had Node.js, I was a full stack developer using JavaScript to build interactive web-based experiences and Java to provide APIs and backend services. I would revel in the creative flow of JavaScript, and then have to completely shift gears to translate all of it into an object model for Java. What a waste of time! When I found Node.js, I could finally iterate efficiently and effectively both on the client and the server. I liter‐ ally dropped everything, sold my house, and moved to San Francisco to work with Node.js. xi
📄 Page
14
I built data aggregation systems, social media platforms, and video chat—all with Node.js. Then I helped Netflix, PayPal, Walmart, and even NASA learn how to use the platform effectively. The JavaScript APIs were rarely folks’ biggest challenge. What confused people most was the asynchronous programming model. If you don’t understand the tools you are using, how can you expect to achieve the best results with those tools? Asynchronous programming requires you to think a bit more like a computer system rather than a linear script of consecutive actions. This asynchrony is the heartbeat of a good distributed system. When Thomas asked me to review the table of contents of this book to make sure he’d covered everything, I noticed that the section on scaling starts with an overview of the cluster module. I immediately flagged it as an area of concern. Cluster was created to enable single instance concurrency that can be exposed to a single port on a sys‐ tem. I’ve seen folks new to Node.js take this and run with the assumption that since concurrency may be desirable, cluster is the right tool for their needs. In distributed systems, concurrency at the instance level is usually a waste of time. Luck had it that Thomas and I were on the same page, and this led to a delightful talk at SFNode by our top presenter. So, as you are building your aptitude as a Node.js developer and as a distributed sys‐ tems developer, take time to understand the constraints and opportunities in your system. Node.js has incredibly performant I/O capabilities. I’ve seen downstream sys‐ tems become overwhelmed when old services were removed and replaced with Node.js implementations. These systems acted as natural rate limiters that the down‐ stream services had been built to accommodate. Adding a simple Node.js proxy can fix most issues until the downstream services are updated or replaced. The ease of development with Node will enable you to try many things. Don’t be afraid to throw out code and start over. Node.js development thrives in iteration. Dis‐ tributed systems let us isolate and encapsulate logic at a service level, which we then can load balance across to validate whole system performance. But don’t just take my word for it. The pages in this book show you how to do this most effectively. Have fun and share what you learn along the way. — Dan Shaw (@dshaw) Founder and CTO, NodeSource The Node.js Company Always bet on Node.js xii | Foreword
📄 Page
15
Preface Between the NodeSchool San Francisco and Ann Arbor PHP MySQL groups, I’ve dedi‐ cated several years of my life to teaching others how to program. By now I’ve worked with hundreds of students, often starting with the mundane process of installing required software and configuring it. Afterwards, with a little bit of code and a whole lot of explanation, we get to the part where the student’s program runs and it all just “clicks.” I can always tell when it happens: the student smiles and they discuss the pos‐ sibilities of their newly acquired skill as if it were a power-up in a video game. My goal is to re-create that tingle of excitement for you, the reader, throughout this book. Within these pages you’ll find many hands-on examples where you get to run various backing services on your development machine and then interact with them using example Node.js application code. With that comes lots of explanation and small tangents to appease the curious. Once you’re finished with this book, you will have installed and run many different services and, with each of these services, you will have written Node.js application code to interact with them. This book places a greater emphasis on these interactions than it does on examining Node.js application code. JavaScript is a powerful language capable of developing both frontend and backend applications. This makes it too easy to go all-in on just learning the language while shying away from periphery technologies. The thesis of this book is that we JavaScript engineers benefit greatly by having first-hand experience with technologies that many assume only engineers using more traditional enterprise platforms like Java or .NET are familiar with. Target Audience This book won’t teach you how to use Node.js, and to benefit the most from it, you should have already written several Node.js applications and have a concrete understanding of JavaScript. That said, this book does cover some advanced and xiii
📄 Page
16
lesser-known concepts about Node.js and JavaScript, such as “The Single-Threaded Nature of JavaScript” on page 1 and “The Node.js Event Loop” on page 9. You should also be familiar with the basics of HTTP, have used at least one database for persisting state, and know how easy and dangerous it is to maintain state within a running Node.js process. Perhaps you already work at a company that has infrastructure for running backend services and you’re eager to learn how it works and how your Node.js applications can benefit from it. Or maybe you’ve got a Node.js application that you’re running as a side project and you’re tired of it crashing. You might even be the CTO of a young startup and are determined to meet the demands of your growing userbase. If any of these situations sound familiar, then this book is for you. Goals Node.js is often used for building frontend web applications. This book doesn’t cover any topics related to frontend development or browser concerns. A wealth of books are already available that cover such content. Instead, the goal of this book is to have you integrate backend Node.js services with various services that support modern distributed systems. By the time you’re done reading this book, you’ll have an understanding of many technologies required to run Node.js services in a production environment. For example, what it takes to deploy and scale an application, how to make it redundant and resilient to failure, how to reliably communicate with other distributed processes, and how to observe the health of the application. You won’t become an expert on these systems just by reading this book. The opera‐ tional work required to tune and shard and deploy scalable ELK services to produc‐ tion, for example, isn’t touched on. However, you will understand how to run a local ELK instance, send it logs from your Node.js service, and create a dashboard for visu‐ alizing the service’s health (this is covered in “Logging with ELK” on page 93). This book certainly doesn’t cover all of the technology used by your particular employer. Although Chapter 7 discusses Kubernetes, a technology for orchestrating the deployments of application code, your employer may instead use a different solu‐ tion like Apache Mesos. Or perhaps you rely on a version of Kubernetes in a cloud environment where the underlying implementation is hidden from you. At any rate, by learning about tools in the different layers of a distributed backend service stack, you’ll more easily understand other technology stacks that you may encounter. xiv | Preface
📄 Page
17
Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/tlhunter/distributed-node. If you have a technical question or a problem using the code examples, please email bookquestions@oreilly.com. Preface | xv
📄 Page
18
This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Distributed Systems with Node.js by Thomas Hunter II (O’Reilly). Copyright 2020 Thomas Hunter II, 978-1-492-07729-9.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit http://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/dist-nodejs. xvi | Preface
📄 Page
19
Email bookquestions@oreilly.com to comment or ask technical questions about this book. For news and information about our books and courses, visit http://oreilly.com. Find us on Facebook: http://facebook.com/oreilly Follow us on Twitter: http://twitter.com/oreillymedia Watch us on YouTube: http://youtube.com/oreillymedia Acknowledgments This book was made possible thanks to the detailed technical reviews provided by the following people: Fernando Larrañaga (@xabadu) Fernando is an engineer, open source contributor, and has been leading Java‐ Script and Node.js communities for several years both in South America and in the US. He’s currently a Senior Software Engineer at Square, and with previous tenures at other major tech companies, such as Twilio and Groupon, he has been developing enterprise-level Node.js and scaling web applications used by millions of users for more than seven years. Bryan English (@bengl) Bryan is an open source JavaScript and Rust programmer and enthusiast and has worked on large enterprise systems, instrumentation, and application security. Currently he’s a Senior Open Source Software engineer at Datadog. He’s used Node.js both professionally and in personal projects since not long after its inception. He is also a Node.js core collaborator and has contributed to Node.js in many ways through several of its various Working Groups. Julián Duque (@julian_duque) Julián Duque is a community leader, public speaker, JavaScript/Node.js evangel‐ ist, and an official Node.js collaborator (Emeritus). Currently working at Sales‐ force Heroku as a Sr. Developer Advocate and currently organizing JSConf and NodeConf Colombia, he is also helping organize JSConf México and MedellinJS, the largest JavaScript user group in Colombia with 5,000+ registered members. He is also passionate about education and has been teaching software develop‐ ment fundamentals, JavaScript, and Node.js through different community work‐ shops, professional training engagements, and online platforms such as Platzi. I’d also like to give a special thanks to those who provided me with guidance and feedback: Dan Shaw (@dshaw), Brad Vogel (@BradVogel), Matteo Collina (@matteo‐ collina), Matt Ranney (@mranney), and Rich Trott (@trott). Preface | xvii
📄 Page
20
(This page has no text content)