Hadoop in Practice, 2nd Edition (Alex Holmes) (Z-Library)

Author: Alex Holmes

科学

Hadoop in Practice, Second Edition provides over 100 tested, instantly useful techniques that will help you conquer big data, using Hadoop. This revised new edition covers changes and new features in the Hadoop core architecture, including MapReduce 2. Brand new chapters cover YARN and integrating Kafka, Impala, and Spark SQL with Hadoop. You'll also get new and updated techniques for Flume, Sqoop, and Mahout, all of which have seen major new versions recently. In short, this is the most practical, up-to-date coverage of Hadoop available anywhere.

📄 File Format: PDF
💾 File Size: 9.9 MB
24
Views
0
Downloads
0.00
Total Donations

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

📄 Page 1
Alex Holmes SECOND EDITION M A N N I N G IN PRACTICE INCLUDES 104 TECHNIQUES www.it-ebooks.info
📄 Page 2
Praise for the First Edition of Hadoop in Practice A new book from Manning, Hadoop in Practice, is definitely the most modern book on the topic. Important subjects, like what commercial variants such as MapR offer, and the many different releases and APIs get uniquely good coverage in this book. —Ted Dunning, Chief Application Architect, MapR Technologies Comprehensive coverage of advanced Hadoop usage, including high-quality code samples. —Chris Nauroth, Senior Staff Software Engineer The Walt Disney Company A very pragmatic and broad overview of Hadoop and the Hadoop tools ecosystem, with a wide set of interesting topics that tickle the creative brain. —Mark Kemna, Chief Technology Officer, Brilig A practical introduction to the Hadoop ecosystem. —Philipp K. Janert, Principal Value, LLC This book is the horizontal roof that each of the pillars of individual Hadoop technology books hold. It expertly ties together all the Hadoop ecosystem technologies. —Ayon Sinha, Big Data Architect, Britely I would take this book on my path to the future. —Alexey Gayduk, Senior Software Engineer, Grid Dynamics A high-quality and well-written book that is packed with useful examples. The breadth and detail of the material is by far superior to any other Hadoop reference guide. It is perfect for anyone who likes to learn new tools/technologies while following pragmatic, real-world examples. —Amazon reviewer www.it-ebooks.info
📄 Page 3
www.it-ebooks.info
📄 Page 4
Hadoop in Practice Second Edition ALEX HOLMES M A N N I N G Shelter Island www.it-ebooks.info
📄 Page 5
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2015 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Development editor: Cynthia Kane Manning Publications Co. Copyeditor: Andy Carroll 20 Baldwin Road Proofreader: Melody Dolab Shelter Island, NY 11964 Typesetter: Gordan Salinovic Cover designer: Marija Tudor ISBN 9781617292224 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – EBM – 19 18 17 16 15 14 www.it-ebooks.info
📄 Page 6
v brief contents PART 1 BACKGROUND AND FUNDAMENTALS ......................................1 1 ■ Hadoop in a heartbeat 3 2 ■ Introduction to YARN 22 PART 2 DATA LOGISTICS .............................................................59 3 ■ Data serialization—working with text and beyond 61 4 ■ Organizing and optimizing data in HDFS 139 5 ■ Moving data into and out of Hadoop 174 PART 3 BIG DATA PATTERNS ......................................................253 6 ■ Applying MapReduce patterns to big data 255 7 ■ Utilizing data structures and algorithms at scale 302 8 ■ Tuning, debugging, and testing 337 PART 4 BEYOND MAPREDUCE ...................................................385 9 ■ SQL on Hadoop 387 10 ■ Writing a YARN application 425 www.it-ebooks.info
📄 Page 7
www.it-ebooks.info
📄 Page 8
vii contents preface xv acknowledgments xvii about this book xviii about the cover illustration xxiii PART 1 BACKGROUND AND FUNDAMENTALS..........................1 1 Hadoop in a heartbeat 3 1.1 What is Hadoop? 4 Core Hadoop components 5 ■ The Hadoop ecosystem 10 Hardware requirements 11 ■ Hadoop distributions 12 ■ Who’s using Hadoop? 14 ■ Hadoop limitations 15 1.2 Getting your hands dirty with MapReduce 17 1.3 Summary 21 2 Introduction to YARN 22 2.1 YARN overview 23 Why YARN? 24 ■ YARN concepts and components 26 YARN configuration 29 TECHNIQUE 1 Determining the configuration of your cluster 29 Interacting with YARN 31 www.it-ebooks.info
📄 Page 9
CONTENTSviii TECHNIQUE 2 Running a command on your YARN cluster 31 TECHNIQUE 3 Accessing container logs 32 TECHNIQUE 4 Aggregating container log files 36 YARN challenges 39 2.2 YARN and MapReduce 40 Dissecting a YARN MapReduce application 40 ■ Configuration 42 Backward compatibility 46 TECHNIQUE 5 Writing code that works on Hadoop versions 1 and 2 47 Running a job 48 TECHNIQUE 6 Using the command line to run a job 49 Monitoring running jobs and viewing archived jobs 49 Uber jobs 50 TECHNIQUE 7 Running small MapReduce jobs 50 2.3 YARN applications 52 NoSQL 53 ■ Interactive SQL 54 ■ Graph processing 54 Real-time data processing 55 ■ Bulk synchronous parallel 55 MPI 56 ■ In-memory 56 ■ DAG execution 56 2.4 Summary 57 PART 2 DATA LOGISTICS.................................................59 3 Data serialization—working with text and beyond 61 3.1 Understanding inputs and outputs in MapReduce 62 Data input 63 ■ Data output 66 3.2 Processing common serialization formats 68 XML 69 TECHNIQUE 8 MapReduce and XML 69 JSON 72 TECHNIQUE 9 MapReduce and JSON 73 3.3 Big data serialization formats 76 Comparing SequenceFile, Protocol Buffers, Thrift, and Avro 76 SequenceFile 78 TECHNIQUE 10 Working with SequenceFiles 80 TECHNIQUE 11 Using SequenceFiles to encode Protocol Buffers 87 Protocol Buffers 91 ■ Thrift 92 ■ Avro 93 TECHNIQUE 12 Avro’s schema and code generation 93 www.it-ebooks.info
📄 Page 10
CONTENTS ix TECHNIQUE 13 Selecting the appropriate way to use Avro in MapReduce 98 TECHNIQUE 14 Mixing Avro and non-Avro data in MapReduce 99 TECHNIQUE 15 Using Avro records in MapReduce 102 TECHNIQUE 16 Using Avro key/value pairs in MapReduce 104 TECHNIQUE 17 Controlling how sorting works in MapReduce 108 TECHNIQUE 18 Avro and Hive 108 TECHNIQUE 19 Avro and Pig 111 3.4 Columnar storage 113 Understanding object models and storage formats 115 ■ Parquet and the Hadoop ecosystem 116 ■ Parquet block and page sizes 117 TECHNIQUE 20 Reading Parquet files via the command line 117 TECHNIQUE 21 Reading and writing Avro data in Parquet with Java 119 TECHNIQUE 22 Parquet and MapReduce 120 TECHNIQUE 23 Parquet and Hive/Impala 125 TECHNIQUE 24 Pushdown predicates and projection with Parquet 126 Parquet limitations 128 3.5 Custom file formats 129 Input and output formats 129 TECHNIQUE 25 Writing input and output formats for CSV 129 The importance of output committing 137 3.6 Chapter summary 138 4 Organizing and optimizing data in HDFS 139 4.1 Data organization 140 Directory and file layout 140 ■ Data tiers 141 ■ Partitioning 142 TECHNIQUE 26 Using MultipleOutputs to partition your data 142 TECHNIQUE 27 Using a custom MapReduce partitioner 145 Compacting 148 TECHNIQUE 28 Using filecrush to compact data 149 TECHNIQUE 29 Using Avro to store multiple small binary files 151 Atomic data movement 157 4.2 Efficient storage with compression 158 TECHNIQUE 30 Picking the right compression codec for your data 159 www.it-ebooks.info
📄 Page 11
CONTENTSx TECHNIQUE 31 Compression with HDFS, MapReduce, Pig, and Hive 163 TECHNIQUE 32 Splittable LZOP with MapReduce, Hive, and Pig 168 4.3 Chapter summary 173 5 Moving data into and out of Hadoop 174 5.1 Key elements of data movement 175 5.2 Moving data into Hadoop 177 Roll your own ingest 177 TECHNIQUE 33 Using the CLI to load files 178 TECHNIQUE 34 Using REST to load files 180 TECHNIQUE 35 Accessing HDFS from behind a firewall 183 TECHNIQUE 36 Mounting Hadoop with NFS 186 TECHNIQUE 37 Using DistCp to copy data within and between clusters 188 TECHNIQUE 38 Using Java to load files 194 Continuous movement of log and binary files into HDFS 196 TECHNIQUE 39 Pushing system log messages into HDFS with Flume 197 TECHNIQUE 40 An automated mechanism to copy files into HDFS 204 TECHNIQUE 41 Scheduling regular ingress activities with Oozie 209 Databases 214 TECHNIQUE 42 Using Sqoop to import data from MySQL 215 HBase 227 TECHNIQUE 43 HBase ingress into HDFS 227 TECHNIQUE 44 MapReduce with HBase as a data source 230 Importing data from Kafka 232 5.3 Moving data into Hadoop 234 TECHNIQUE 45 Using Camus to copy Avro data from Kafka into HDFS 234 5.4 Moving data out of Hadoop 241 Roll your own egress 241 TECHNIQUE 46 Using the CLI to extract files 241 TECHNIQUE 47 Using REST to extract files 242 TECHNIQUE 48 Reading from HDFS when behind a firewall 243 TECHNIQUE 49 Mounting Hadoop with NFS 243 TECHNIQUE 50 Using DistCp to copy data out of Hadoop 244 www.it-ebooks.info
📄 Page 12
CONTENTS xi TECHNIQUE 51 Using Java to extract files 245 Automated file egress 246 TECHNIQUE 52 An automated mechanism to export files from HDFS 246 Databases 247 TECHNIQUE 53 Using Sqoop to export data to MySQL 247 NoSQL 251 5.5 Chapter summary 252 PART 3 BIG DATA PATTERNS..........................................253 6 Applying MapReduce patterns to big data 255 6.1 Joining 256 TECHNIQUE 54 Picking the best join strategy for your data 257 TECHNIQUE 55 Filters, projections, and pushdowns 259 Map-side joins 260 TECHNIQUE 56 Joining data where one dataset can fit into memory 261 TECHNIQUE 57 Performing a semi-join on large datasets 264 TECHNIQUE 58 Joining on presorted and prepartitioned data 269 Reduce-side joins 271 TECHNIQUE 59 A basic repartition join 271 TECHNIQUE 60 Optimizing the repartition join 275 TECHNIQUE 61 Using Bloom filters to cut down on shuffled data 279 Data skew in reduce-side joins 283 TECHNIQUE 62 Joining large datasets with high join-key cardinality 284 TECHNIQUE 63 Handling skews generated by the hash partitioner 286 6.2 Sorting 287 Secondary sort 288 TECHNIQUE 64 Implementing a secondary sort 289 Total order sorting 294 TECHNIQUE 65 Sorting keys across multiple reducers 294 6.3 Sampling 297 TECHNIQUE 66 Writing a reservoir-sampling InputFormat 297 6.4 Chapter summary 301 www.it-ebooks.info
📄 Page 13
CONTENTSxii 7 Utilizing data structures and algorithms at scale 302 7.1 Modeling data and solving problems with graphs 303 Modeling graphs 304 ■ Shortest-path algorithm 304 TECHNIQUE 67 Find the shortest distance between two users 305 Friends-of-friends algorithm 313 TECHNIQUE 68 Calculating FoFs 313 Using Giraph to calculate PageRank over a web graph 319 7.2 Modeling data and solving problems with graphs 321 TECHNIQUE 69 Calculate PageRank over a web graph 322 7.3 Bloom filters 326 TECHNIQUE 70 Parallelized Bloom filter creation in MapReduce 328 7.4 HyperLogLog 333 A brief introduction to HyperLogLog 333 TECHNIQUE 71 Using HyperLogLog to calculate unique counts 335 7.5 Chapter summary 336 8 Tuning, debugging, and testing 337 8.1 Measure, measure, measure 338 8.2 Tuning MapReduce 339 Common inefficiencies in MapReduce jobs 339 TECHNIQUE 72 Viewing job statistics 340 Map optimizations 343 TECHNIQUE 73 Data locality 343 TECHNIQUE 74 Dealing with a large number of input splits 344 TECHNIQUE 75 Generating input splits in the cluster with YARN 346 Shuffle optimizations 347 TECHNIQUE 76 Using the combiner 347 TECHNIQUE 77 Blazingly fast sorting with binary comparators 349 TECHNIQUE 78 Tuning the shuffle internals 353 Reducer optimizations 356 TECHNIQUE 79 Too few or too many reducers 356 General tuning tips 357 www.it-ebooks.info
📄 Page 14
CONTENTS xiii TECHNIQUE 80 Using stack dumps to discover unoptimized user code 358 TECHNIQUE 81 Profiling your map and reduce tasks 360 8.3 Debugging 362 Accessing container log output 362 TECHNIQUE 82 Examining task logs 362 Accessing container start scripts 363 TECHNIQUE 83 Figuring out the container startup command 363 Debugging OutOfMemory errors 365 TECHNIQUE 84 Force container JVMs to generate a heap dump 365 MapReduce coding guidelines for effective debugging 365 TECHNIQUE 85 Augmenting MapReduce code for better de bugging 365 8.4 Testing MapReduce jobs 368 Essential ingredients for effective unit testing 368 ■ MRUnit 370 TECHNIQUE 86 Using MRUnit to unit-test MapReduce 371 LocalJobRunner 378 TECHNIQUE 87 Heavyweight job testing with the LocalJobRunner 378 MiniMRYarnCluster 381 TECHNIQUE 88 Using MiniMRYarnCluster to test your jobs 381 Integration and QA testing 382 8.5 Chapter summary 383 PART 4 BEYOND MAPREDUCE .......................................385 9 SQL on Hadoop 387 9.1 Hive 388 Hive basics 388 ■ Reading and writing data 391 TECHNIQUE 89 Working with text files 391 TECHNIQUE 90 Exporting data to local disk 395 User-defined functions in Hive 396 TECHNIQUE 91 Writing UDFs 396 Hive performance 399 TECHNIQUE 92 Partitioning 399 TECHNIQUE 93 Tuning Hive joins 404 www.it-ebooks.info
📄 Page 15
CONTENTSxiv 9.2 Impala 409 Impala vs. Hive 410 ■ Impala basics 410 TECHNIQUE 94 Working with text 410 TECHNIQUE 95 Working with Parquet 412 TECHNIQUE 96 Refreshing metadata 413 User-defined functions in Impala 414 TECHNIQUE 97 Executing Hive UDFs in Impala 415 9.3 Spark SQL 416 Spark 101 417 ■ Spark on Hadoop 419 ■ SQL with Spark 419 TECHNIQUE 98 Calculating stock averages with Spark SQL 420 TECHNIQUE 99 Language-integrated queries 422 TECHNIQUE 100 Hive and Spark SQL 423 9.4 Chapter summary 423 10 Writing a YARN application 425 10.1 Fundamentals of building a YARN application 426 Actors 426 ■ The mechanics of a YARN application 427 10.2 Building a YARN application to collect cluster statistics 429 TECHNIQUE 101 A bare-bones YARN client 429 TECHNIQUE 102 A bare-bones ApplicationMaster 434 TECHNIQUE 103 Running the application and accessing logs 438 TECHNIQUE 104 Debugging using an unmanaged application master 440 10.3 Additional YARN application capabilities 443 RPC between components 443 ■ Service discovery 444 Checkpointing application progress 444 ■ Avoiding split-brain 444 Long-running applications 444 ■ Security 445 10.4 YARN programming abstractions 445 Twill 446 ■ Spring 448 ■ REEF 450 ■ Picking a YARN API abstraction 450 10.5 Summary 450 appendix Installing Hadoop and friends 451 index 475 bonus chapters available for download from www.manning.com/holmes2 chapter 11 Integrating R and Hadoop for statistics and more chapter 12 Predictive analytics with Mahout www.it-ebooks.info
📄 Page 16
xv preface I first encountered Hadoop in the fall of 2008 when I was working on an internet crawl-and-analysis project at Verisign. We were making discoveries similar to those that Doug Cutting and others at Nutch had made several years earlier about how to effi- ciently store and manage terabytes of crawl-and-analyzed data. At the time, we were getting by with our homegrown distributed system, but the influx of a new data stream and requirements to join that stream with our crawl data couldn’t be supported by our existing system in the required timeline. After some research, we came across the Hadoop project, which seemed to be a perfect fit for our needs—it supported storing large volumes of data and provided a compute mechanism to combine them. Within a few months, we built and deployed a MapReduce application encompassing a number of MapReduce jobs, woven together with our own MapReduce workflow management system, onto a small cluster of 18 nodes. It was a revelation to observe our MapReduce jobs crunching through our data in minutes. Of course, what we weren’t expecting was the amount of time that we would spend debugging and performance-tuning our MapReduce jobs. Not to men- tion the new roles we took on as production administrators—the biggest surprise in this role was the number of disk failures we encountered during those first few months supporting production. As our experience and comfort level with Hadoop grew, we continued to build more of our functionality using Hadoop to help with our scaling challenges. We also started to evangelize the use of Hadoop within our organization and helped kick-start other projects that were also facing big data challenges. www.it-ebooks.info
📄 Page 17
PREFACExvi The greatest challenge we faced when working with Hadoop, and specifically MapReduce, was relearning how to solve problems with it. MapReduce is its own fla- vor of parallel programming, and it’s quite different from the in-JVM programming that we were accustomed to. The first big hurdle was training our brains to think MapReduce, a topic which the book Hadoop in Action by Chuck Lam (Manning Publi- cations, 2010) covers well. After one is used to thinking in MapReduce, the next challenge is typically related to the logistics of working with Hadoop, such as how to move data in and out of HDFS and effective and efficient ways to work with data in Hadoop. These areas of Hadoop haven’t received much coverage, and that’s what attracted me to the potential of this book—the chance to go beyond the fundamental word-count Hadoop uses and cover- ing some of the trickier and dirtier aspects of Hadoop. As I’m sure many authors have experienced, I went into this project confidently believing that writing this book was just a matter of transferring my experiences onto paper. Boy, did I get a reality check, but not altogether an unpleasant one, because writing introduced me to new approaches and tools that ultimately helped better my own Hadoop abilities. I hope that you get as much out of reading this book as I did writing it. www.it-ebooks.info
📄 Page 18
xvii acknowledgments First and foremost, I want to thank Michael Noll, who pushed me to write this book. He provided invaluable insights into how to structure the content of the book, reviewed my early chapter drafts, and helped mold the book. I can’t express how much his support and encouragement has helped me throughout the process. I’m also indebted to Cynthia Kane, my development editor at Manning, who coached me through writing this book and provided invaluable feedback on my work. Among the many notable “aha!” moments I had when working with Cynthia, the big- gest one was when she steered me into using visual aids to help explain some of the complex concepts in this book. All of the Manning staff were a pleasure to work with, and a special shout out goes to Troy Mott, Nick Chase, Tara Walsh, Bob Herbstman, Michael Stephens, Marjan Bace, Maureen Spencer, and Kevin Sullivan. I also want to say a big thank you to all the reviewers of this book: Adam Kawa, Andrea Tarocchi, Anna Lahoud, Arthur Zubarev, Edward Ribeiro, Fillipe Massuda, Gerd Koenig, Jeet Marwah, Leon Portman, Mohamed Diouf, Muthuswamy Manigan- dan, Rodrigo Abreu, and Serega Sheypack. Jonathan Siedman, the primary technical reviewer, did a great job of reviewing the entire book. Many thanks to Josh Wills, the creator of Crunch, who kindly looked over the chap- ter that covered that topic. And more thanks go to Josh Patterson, who reviewed my Mahout chapter. Finally, a special thanks to my wife, Michal, who had to put up with a cranky husband working crazy hours. She was a source of encouragement throughout the entire process. www.it-ebooks.info
📄 Page 19
xviii about this book Doug Cutting, the creator of Hadoop, likes to call Hadoop the kernel for big data, and I would tend to agree. With its distributed storage and compute capabilities, Hadoop is fundamentally an enabling technology for working with huge datasets. Hadoop provides a bridge between structured (RDBMS) and unstructured (log files, XML, text) data and allows these datasets to be easily joined together. This has evolved from traditional use cases, such as combining OLTP and log files, to more sophisti- cated uses, such as using Hadoop for data warehousing (exemplified by Facebook) and the field of data science, which studies and makes new discoveries about data. This book collects a number of intermediary and advanced Hadoop examples and presents them in a problem/solution format. Each technique addresses a specific task you’ll face, like using Flume to move log files into Hadoop or using Mahout for pre- dictive analysis. Each problem is explored step by step, and as you work through them, you’ll find yourself growing more comfortable with Hadoop and at home in the world of big data. This hands-on book targets users who have some practical experience with Hadoop and understand the basic concepts of MapReduce and HDFS. Manning’s Hadoop in Action by Chuck Lam contains the necessary prerequisites to understand and apply the techniques covered in this book. Many techniques in this book are Java-based, which means readers are expected to possess an intermediate-level knowledge of Java. An excellent text for all levels of Java users is Effective Java, Second Edition by Joshua Bloch (Addison-Wesley, 2008). www.it-ebooks.info
📄 Page 20
ABOUT THIS BOOK xix Roadmap This book has 10 chapters divided into four parts. Part 1 contains two chapters that form the introduction to this book. They review Hadoop basics and look at how to get Hadoop up and running on a single host. YARN, which is new in Hadoop version 2, is also examined, and some operational tips are provided for performing basic functions in YARN. Part 2, “Data logistics,” consists of three chapters that cover the techniques and tools required to deal with data fundamentals, how to work with various data formats, how to organize and optimize your data, and getting data into and out of Hadoop. Picking the right format for your data and determining how to organize data in HDFS are the first items you’ll need to address when working with Hadoop, and they’re cov- ered in chapters 3 and 4 respectively. Getting data into Hadoop is one of the bigger hurdles commonly encountered when working with Hadoop, and chapter 5 is dedi- cated to looking at a variety of tools that work with common enterprise data sources. Part 3 is called “Big data patterns,” and it looks at techniques to help you work effec- tively with large volumes of data. Chapter 6 covers how to represent data such as graphs for use with MapReduce, and it looks at several algorithms that operate on graph data. Chapter 7 looks at more advanced data structures and algorithms such as graph pro- cessing and using HyperLogLog for working with large datasets. Chapter 8 looks at how to tune, debug, and test MapReduce performance issues, and it also covers a number of techniques to help make your jobs run faster. Part 4 is titled “Beyond MapReduce,” and it examines a number of technologies that make it easier to work with Hadoop. Chapter 9 covers the most prevalent and promising SQL technologies for data processing on Hadoop, and Hive, Impala, and Spark SQL are examined. The final chapter looks at how to write your own YARN appli- cation, and it provides some insights into some of the more advanced features you can use in your applications. The appendix covers instructions for the source code that accompanies this book, as well as installation instructions for Hadoop and all the other related technologies covered in the book. Finally, there are two bonus chapters available from the publisher’s website at www.manning.com/HadoopinPracticeSecondEdition: chapter 11 “Integrating R and Hadoop for statistics and more” and chapter 12 “Predictive analytics with Mahout.” What’s new in the second edition? This second edition covers Hadoop 2, which at the time of writing is the current production-ready version of Hadoop. The first edition of the book covered Hadoop 0.22 (Hadoop 1 wasn’t yet out), and Hadoop 2 has turned the world upside-down and opened up the Hadoop platform to processing paradigms beyond MapReduce. YARN, the new scheduler and application manager in Hadoop 2, is complex and new to the community, which prompted me to dedicate a new chapter 2 to covering YARN basics and to discussing how MapReduce now functions as a YARN application. www.it-ebooks.info
The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00
Total Amount (¥)
0
Donation Count

Login to support the author

Login Now
Back to List