Algorithms Illuminated (Part 4) Algorithms for NP-Hard Problems (Tim Roughgarden)（Z-Library）

(This page has no text content)

Algorithms Illuminated Part 4: Algorithms for NP-Hard Problems Tim Roughgarden

c 2020 by Tim Roughgarden All rights reserved. No portion of this book may be reproduced in any form without permission from the publisher, except as permitted by U. S. copyright law. First Edition Cover image: Norman Zammitt, Blue Burning, 1982 acrylic on canvas; 84 x 168 1/2 in. (213.36 x 427.99 cm) San Francisco Museum of Modern Art, Gift of Judge and Mrs. William J. Lasarow c Estate of Norman Zammitt Photograph: Katherine Du Tiel ISBN: 978-0-9992829-6-0 (Paperback) ISBN: 978-0-9992829-7-7 (ebook) Library of Congress Control Number: 2017914282 Soundlikeyourself Publishing, LLC New York, NY soundlikeyourselfpublishing@gmail.com www.algorithmsilluminated.org

Contents Preface v 19 What Is NP-Hardness? 1 19.1 MST vs. TSP: An Algorithmic Mystery 1 19.2 Possible Levels of Expertise 7 19.3 Easy and Hard Problems 9 19.4 Algorithmic Strategies for NP-Hard Problems 17 19.5 Proving NP-Hardness: A Simple Recipe 23 19.6 Rookie Mistakes and Acceptable Inaccuracies 33 Problems 37 20 Compromising on Correctness: Efficient Inexact Algorithms 41 20.1 Makespan Minimization 41 20.2 Maximum Coverage 55 *20.3 Influence Maximization 66 20.4 The 2-OPT Heuristic Algorithm for the TSP 75 20.5 Principles of Local Search 82 Problems 94 21 Compromising on Speed: Exact Inefficient Algorithms 105 21.1 The Bellman-Held-Karp Algorithm for the TSP 105 *21.2 Finding Long Paths by Color Coding 114 21.3 Problem-Specific Algorithms vs. Magic Boxes 127 21.4 Mixed Integer Programming Solvers 129 21.5 Satisfiability Solvers 134 Problems 140 iii

iv Contents 22 Proving Problems NP-Hard 148 22.1 Reductions Revisited 148 22.2 3-SAT and the Cook-Levin Theorem 150 22.3 The Big Picture 152 22.4 A Template for Reductions 156 22.5 Independent Set Is NP-Hard 158 *22.6 Directed Hamiltonian Path Is NP-Hard 163 22.7 The TSP Is NP-Hard 169 22.8 Subset Sum Is NP-Hard 172 Problems 178 23 P, NP, and All That 182 *23.1 Amassing Evidence of Intractability 183 *23.2 Decision, Search, and Optimization 185 *23.3 NP : Problems with Easily Recognized Solutions 186 *23.4 The P 6= NP Conjecture 192 *23.5 The Exponential Time Hypothesis 195 *23.6 NP-Completeness 200 Problems 205 24 Case Study: The FCC Incentive Auction 208 24.1 Repurposing Wireless Spectrum 208 24.2 Greedy Heuristics for Buying Back Licenses 211 24.3 Feasibility Checking 219 24.4 Implementation as a Descending Clock Auction 226 24.5 The Final Outcome 231 Problems 233 Epilogue: A Field Guide to Algorithm Design 236 Hints and Solutions 239 Index 251

Preface This book is the fourth in a series based on my online algorithms courses that have been running regularly since 2012, which in turn are based on an undergraduate course that I taught many times at Stanford University. Part 4 assumes at least some familiarity with asymptotic analysis and big-O notation, graph search and shortest- path algorithms, greedy algorithms, and dynamic programming (all covered in Parts 1–3). What We’ll Cover in This Book Algorithms Illuminated, Part 4 is all about NP-hard problems and what to do about them. Algorithmic tools for tackling NP-hard problems. Many real- world problems are “NP-hard” and appear unsolvable by the types of always-correct and always-fast algorithms that have starred in the first three parts of this book series. When an NP-hard problem shows up in your own work, you must compromise on either correct- ness or speed. We’ll see techniques old (like greedy algorithms) and new (like local search) for devising fast heuristic algorithms that are “approximately correct,” with applications to scheduling, influence maximization in social networks, and the traveling salesman problem. We’ll also cover techniques old (like dynamic programming) and new (like MIP and SAT solvers) for developing correct algorithms that improve dramatically on exhaustive search; applications here include the traveling salesman problem (again), finding signaling pathways in biological networks, and television station repacking in a recent and high-stakes spectrum auction in the United States. Recognizing NP-hard problems. This book will also train you to quickly recognize an NP-hard problem so that you don’t inadver- v

vi Preface tently waste time trying to design a too-good-to-be-true algorithm for it. You’ll acquire familiarity with many famous and basic NP-hard problems, ranging from satisfiability to graph coloring to the Hamil- tonian path problem. Through practice, you’ll learn the tricks of the trade in proving problems NP-hard via reductions. For a more detailed look into the book’s contents, check out the “Upshot” sections that conclude each chapter and highlight the most important points. The “Field Guide to Algorithm Design” on page 236 provides a bird’s-eye view of how the topics of this book fit into the bigger algorithmic picture. The starred sections of the book are the most advanced ones. The time-constrained reader can skip these sections on a first reading without any loss of continuity. Topics covered in the first three parts. Algorithms Illumi- nated, Part 1 covers asymptotic notation (big-O notation and its close cousins), divide-and-conquer algorithms and the master method, randomized QuickSort and its analysis, and linear-time selection algo- rithms. Part 2 is about data structures (heaps, balanced search trees, hash tables, bloom filters), graph primitives (breadth- and depth-first search, connectivity, shortest paths), and their applications (rang- ing from deduplication to social network analysis). Part 3 focuses on greedy algorithms (scheduling, minimum spanning trees, cluster- ing, Huffman codes) and dynamic programming (knapsack, sequence alignment, shortest paths, optimal search trees). Skills You’ll Learn From This Book Series Mastering algorithms takes time and effort. Why bother? Become a better programmer. You’ll learn several blazingly fast subroutines for processing data as well as several useful data structures for organizing data that you can deploy directly in your own programs. Implementing and using these algorithms will stretch and improve your programming skills. You’ll also learn general algorithm design paradigms that are relevant to many different problems across different domains, as well as tools for predicting the performance of such algorithms. These “algorithmic design patterns” can help you come up with new algorithms for problems that arise in your own work.

Preface vii Sharpen your analytical skills. You’ll get lots of practice describ- ing and reasoning about algorithms. Through mathematical analysis, you’ll gain a deep understanding of the specific algorithms and data structures that these books cover. You’ll acquire facility with sev- eral mathematical techniques that are broadly useful for analyzing algorithms. Think algorithmically. After you learn about algorithms, you’ll start seeing them everywhere, whether you’re riding an elevator, watching a flock of birds, managing your investment portfolio, or even watching an infant learn. Algorithmic thinking is increasingly useful and prevalent in disciplines outside of computer science, including biology, statistics, and economics. Literacy with computer science’s greatest hits. Studying al- gorithms can feel like watching a highlight reel of many of the greatest hits from the last sixty years of computer science. No longer will you feel excluded at that computer science cocktail party when someone cracks a joke about Dijkstra’s algorithm. After reading these books, you’ll know exactly what they mean. Ace your technical interviews. Over the years, countless stu- dents have regaled me with stories about how mastering the concepts in these books enabled them to ace every technical interview question they were ever asked. How These Books Are Different This series of books has only one goal: to teach the basics of algorithms in the most accessible way possible. Think of them as a transcript of what an expert algorithms tutor would say to you over a series of one-on-one lessons. There are a number of excellent more traditional and encyclopedic textbooks about algorithms, any of which usefully complement this book series with additional details, problems, and topics. I encourage you to explore and find your own favorites. There are also several books that, unlike these books, cater to programmers looking for ready-made algorithm implementations in a specific programming language. Many such implementations are freely available on the Web as well.

viii Preface Who Are You? The whole point of these books and the online courses upon which they are based is to be as widely and easily accessible as possible. People of all ages, backgrounds, and walks of life are well represented in my online courses, and there are large numbers of students (high- school, college, etc.), software engineers (both current and aspiring), scientists, and professionals hailing from all corners of the world. This book is not an introduction to programming, and ideally you’ve acquired basic programming skills in a standard language (like Java, Python, C, Scala, Haskell, etc.). If you need to beef up your programming skills, there are several outstanding free online courses that teach basic programming. We also use mathematical analysis as needed to understand how and why algorithms really work. The freely available book Mathe- matics for Computer Science, by Eric Lehman, F. Thomson Leighton, and Albert R. Meyer, is an excellent and entertaining refresher on mathematical notation (like P and 8), the basics of proofs (induction, contradiction, etc.), discrete probability, and much more. Additional Resources These books are based on online courses that are currently running on the Coursera and EdX platforms. I’ve made several resources available to help you replicate as much of the online course experience as you like. Videos. If you’re more in the mood to watch and listen than to read, check out the YouTube video playlists available at www. algorithmsilluminated.org. These videos cover all the topics in this book series, as well as additional advanced topics. I hope they exude a contagious enthusiasm for algorithms that, alas, is impossible to replicate fully on the printed page. Quizzes. How can you know if you’re truly absorbing the concepts in this book? Quizzes with solutions and explanations are scattered throughout the text; when you encounter one, I encourage you to pause and think about the answer before reading on. End-of-chapter problems. At the end of each chapter, you’ll find several relatively straightforward questions that test your understand-

Preface ix ing, followed by harder and more open-ended challenge problems. Hints or solutions to all of these problems (as indicated by an “(H)” or “(S),” respectively) are included at the end of the book. Readers can interact with me and each other about the end-of-chapter problems through the book’s discussion forum (see below). Programming problems. Several of the chapters conclude with suggested programming projects whose goal is to help you develop a detailed understanding of an algorithm by creating your own working implementation of it. Data sets, along with test cases and their solutions, can be found at www.algorithmsilluminated.org. Discussion forums. A big reason for the success of online courses is the opportunities they provide for participants to help each other understand the course material and debug programs through discus- sion forums. Readers of these books have the same opportunity via the forums available at www.algorithmsilluminated.org. Acknowledgments These books would not exist without the passion and hunger supplied by the hundreds of thousands of participants in my algorithms courses over the years. I am particularly grateful to those who supplied detailed feedback on an earlier draft of this book: Tonya Blust, Yuan Cao, Leslie Damon, Tyler Dae Devlin, Roman Gafiteanu, Blanca Huergo, Jim Humelsine, Tim Kearns, Vladimir Kokshenev, Bayram Kuliyev, Clayton Wong, Lexin Ye, and Daniel Zingaro. Thanks also to several experts who provided technical advice: Amir Abboud, Vincent Conitzer, Christian Kroer, Aviad Rubinstein, and Ilya Segal. I always appreciate suggestions and corrections from readers. These are best communicated through the discussion forums men- tioned above. Tim Roughgarden New York, NY June 2020

(This page has no text content)

Chapter 19 What Is NP-Hardness? Introductory books on algorithms, including Parts 1–3 of this series, suffer from selection bias. They focus on computational problems that are solvable by clever, fast algorithms—after all, what’s more fun and empowering to learn than an ingenious algorithmic short-cut? The good news is that many fundamental and practically relevant problems fall into this category: sorting, graph search, shortest paths, Huffman codes, minimum spanning trees, sequence alignment, and so on. But it would be fraudulent to teach you only this cherry-picked collection of problems while ignoring the spectre of computational intractability that haunts the serious algorithm designer or programmer. Sadly, there are many important computational problems, including ones likely to show up in your own projects, for which no fast algorithms are known. Even worse, we can’t expect any future algorithmic breakthroughs for these problems, as they are widely believed to be intrinsically difficult and unsolvable by any fast algorithm. Newly aware of this stark reality, two questions immediately come to mind. First, how can you recognize such hard problems when they appear in your own work, so that you can adjust your expectations accordingly and avoid wasting time looking for a too-good-to-be-true algorithm? Second, when such a problem is important to your appli- cation, how should you revise your ambitions, and what algorithmic tools can you apply to achieve them? This book will equip you with thorough answers to both questions. 19.1 MST vs. TSP: An Algorithmic Mystery Hard computational problems can look a lot like easy ones, and telling them apart requires a trained eye. To set the stage, let’s rendezvous with a familiar friend (the minimum spanning tree problem) and meet its more demanding cousin (the traveling salesman problem). 1

2 What Is NP-Hardness? 19.1.1 The Minimum Spanning Tree Problem One famous computational problem solvable by a blazingly fast al- gorithm is the minimum spanning tree (MST) problem (covered in Chapter 15 of Part 3).1 Problem: Minimum Spanning Tree (MST) Input: A connected undirected graph G = (V,E) and a real-valued cost ce for each edge e 2 E. Output: A spanning tree T ✓ E of G with the minimum- possible sum P e2T ce of edge costs. Recall that a graph G = (V,E) is connected if, for every pair v, w 2 V of vertices, the graph contains a path from v to w. A spanning tree of G is a subset T ✓ E of edges such that the subgraph (V, T ) is both connected and acyclic. For example, in the graph 1 2 3 4 5 a b c d the minimum spanning tree comprises the edges (a, b), (b, d), and (a, c), for an overall cost of 7. A graph can have an exponential number of spanning trees, so exhaustive search is out of the question for all but the smallest graphs.2 But the MST problem can be solved by clever fast algorithms, 1To review, a graph G = (V,E) has two ingredients: a set V of vertices and a set E of edges. In an undirected graph, each edge e 2 E corresponds to an unordered pair {v, w} of vertices (written as e = (v, w) or e = (w, v)). In a directed graph, each edge (v, w) is an ordered pair, with the edge directed from v to w. The numbers |V | and |E| of vertices and edges are usually denoted by n and m, respectively. 2For example, Cayley’s formula is a famous result from combinatorics stating that the n-vertex complete graph (in which all the  n 2  possible edges are present) has exactly nn2 different spanning trees. This is bigger than the estimated number of atoms in the known universe when n  50.

19.1 MST vs. TSP: An Algorithmic Mystery 3 such as Prim’s and Kruskal’s algorithms. Deploying appropriate data structures (heaps and union-find, respectively), both algorithms have blazingly fast implementations, with a running time of O((m+ n) log n), where m and n are the number of edges and vertices of the input graph, respectively. 19.1.2 The Traveling Salesman Problem Another famous problem, absent from Parts 1–3 but prominent in this book, is the traveling salesman problem (TSP). Its definition is almost the same as that of the MST problem, except with tours— simple cycles that span all vertices—playing the role of spanning trees. Problem: Traveling Salesman Problem (TSP) Input: A complete undirected graph G = (V,E) and a real-valued cost ce for each edge e 2 E.3 Output: A tour T ✓ E of G with the minimum-possible sum P e2T ce of edge costs. Formally, a tour is a cycle that visits every vertex exactly once (with two edges incident to each vertex). Quiz 19.1 In an instance G = (V,E) of the TSP with n  3 vertices, how many distinct tours T ✓ E are there? (In the answers below, n! = n · (n 1) · (n 2) · · · 2 · 1 denotes the factorial function.) a) 2n b) 1 2(n 1)! 3In a complete graph, all  n 2  possible edges are present. The assumption that the graph is complete is without loss of generality, as an arbitrary input graph can be harmlessly turned into a complete graph by adding in all the missing edges and giving them very high costs.

4 What Is NP-Hardness? c) (n 1)! d) n! (See Section 19.1.4 for the solution and discussion.) If all else fails, the TSP can be solved by exhaustively enumerating all of the (finitely many) tours and remembering the best one. Try exhaustive search out on a small example. Quiz 19.2 What is the minimum sum of edge costs of a tour of the following graph? (Each edge is labeled with its cost.) b c 1 2 3 4 6 a d 5 a) 12 b) 13 c) 14 d) 15 (See Section 19.1.4 for the solution and discussion.) The TSP can be feasibly solved by exhaustive search for only the smallest of instances. Can we do better? Could there be, analogous to the MST problem, an algorithm that magically homes in on the minimum-cost needle in the exponential-size haystack of traveling salesman tours? Despite the superficial similarity of the statements of the two problems, the TSP appears to be far more difficult to solve than the MST problem.

19.1 MST vs. TSP: An Algorithmic Mystery 5 19.1.3 Trying and Failing to Solve the TSP I could tell you a cheesy story about, um, a traveling salesman, but this would do a disservice to the TSP, which is actually quite fundamental. Whenever you have a bunch of tasks to complete in a sequence, with the cost or time for carrying out a task dependent on the preceding task, you’re talking about the TSP in disguise. For example, tasks could represent cars to be assembled in a factory, with the time required to assemble a car equal to a fixed cost (for assembly) plus a setup cost that depends on how different the factory configurations are for this and the previous car. Assembling all the cars as quickly as possible boils down to minimizing the sum of the setup costs, which is exactly the TSP. For a very different application, imagine that you’ve collected a bunch of overlapping fragments of a genome and would like to reverse engineer their most plausible ordering. Given a “plausibility measure” that assigns a cost to each fragment pair (for example, derived from the length of their longest common substring), this ordering problem also boils down to the TSP.4 Seduced by the practical applications and aesthetic appeal of the TSP, many of the greatest minds in optimization have, since at least the early 1950s, devoted a tremendous amount of effort and computation to solving large-scale instances of the TSP.5 Despite the decades and intellectual firepower involved: Fact As of this writing (in 2020), there is no known fast algorithm for the TSP. What do we mean by a “fast” algorithm? Back in Part 1, we agreed that: 4Both applications are arguably better modeled as traveling salesman path problems, in which the goal is to compute a minimum-cost cycle-free path that visits every vertex (without going back to the starting vertex). Any algorithm solving the TSP can be easily converted into one solving the path version of the problem, and vice versa (Problem 19.7). 5Readers curious about the history or additional applications of the TSP should check out the first four chapters of the book The Traveling Salesman Problem: A Computational Study, by David L. Applegate, Robert E. Bixby, Vašek Chvátal, and William J. Cook (Princeton University Press, 2006).

6 What Is NP-Hardness? A “fast algorithm” is an algorithm whose worst-case running time grows slowly with the input size. And what do we mean by “grows slowly”? For much of this book series, the holy grail has been algorithms that run in linear or almost-linear time. Forget about such blazingly fast algorithms—for the TSP, no one even knows of an algorithm that always runs in O(n100) time on n-vertex instances, or even O(n10000) time. There are two competing explanations for the dismal state-of-the- art: (i) there is a fast algorithm for the TSP but no one’s been smart enough to find it yet; or (ii) no such algorithm exists. We do not know which explanation is correct, though most experts believe in the second one. Speculation No fast algorithm for the TSP exists. As early as 1967, Jack Edmonds wrote: I conjecture that there is no good algorithm for the trav- eling saleman [sic] problem. My reasons are the same as for any mathematical conjecture: (1) It is a legitimate mathematical possibility, and (2) I do not know.6 Unfortunately, the curse of intractability is not confined to the TSP. We’ll see that many other practically relevant problems are similarly afflicted. 19.1.4 Solutions to Quizzes 19.1–19.2 Solution to Quiz 19.1 Correct answer: (b). There is an intuitive correspondence between vertex orderings (of which there are n!) and tours (which visit the vertices once each, in some order), so answer (d) is a natural guess. However, this correspondence counts each tour in 2n different ways: 6From the paper “Optimum Branchings,” by Jack Edmonds (Journal of Research of the National Bureau of Standards, Series B, 1967). By a “good” algorithm, Edmonds means an algorithm with a running time bounded above by some polynomial function of the input size.

19.2 Possible Levels of Expertise 7 once for each of the n choices of the initial vertex and once for each of the two directions of traversing the tour. Thus, the total number of tours is n!/2n = 1 2(n 1)!. For example, with n = 4, there are three distinct tours: b c a d b c a d b c a d Solution to Quiz 19.2 Correct answer: (b). We can enumerate tours by starting with the vertex a and trying all six possible orderings of the other three vertices, with the understanding that the tour finishes by traveling from the last vertex back to a. (Actually, this enumeration counts each tour twice, once in each direction.) The results: Vertex Ordering Cost of Corresponding Tour a, b, c, d or a, d, c, b 15 a, b, d, c or a, c, d, b 13 a, c, b, d or a, d, b, c 14 The shortest tour is the second one, with a total cost of 13. 19.2 Possible Levels of Expertise Some computational problems are easier than others. The point of the theory of NP-hardness is to classify, in a precise sense, problems as either “computationally easy” (like the MST problem) or “computa- tionally difficult” (like the TSP). This book is aimed both at readers looking for a white-belt primer on the topic and at those pursuing black-belt expertise. This section offers guidance on how to approach the rest of the book, as a function of your goals and constraints. What are your current and desired levels of expertise in recognizing and tackling NP-hard problems?7 7What’s up with the term “NP”? See Section 19.6.

8 What Is NP-Hardness? Level 0: “What’s an NP-hard problem?” Level 0 is total ignorance—you’ve never heard of NP-hardness and are unaware that many practically relevant computational problems are widely believed to be unsolvable by any fast algorithm. If I’ve done my job, this book should be accessible even to level-0 readers. Level 1: “Oh, the problem is NP-hard? I guess we should either reformulate the problem, scale down our ambitions, or invest a lot more resources into solving it.” Level 1 represents cocktail-party-level awareness and at least an informal understanding of what NP-hardness means.8 For example, are you managing a software project with an algorithmic or optimiza- tion component? If so, you should acquire at least level-1 knowledge, in case one of your team members bumps into an NP-hard problem and wants to discuss the possible next steps. To raise your level to 1, study Sections 19.3, 19.4, and 19.6. Level 2: “Oh, the problem is NP-hard? Give me a chance to apply my algorithmic expertise and see how far I can get.” The biggest marginal empowerment for software engineers comes from reaching level 2, and acquiring a rich toolbox for develop- ing practically useful algorithms for solving or approximating NP- hard problems. Serious programmers should shoot for this level (or above). Happily, all the algorithmic paradigms that we developed for polynomial-time solvable problems in Parts 1–3 are also useful for making headway on NP-hard problems. The goal of Chapters 20 and 21 is to bring you up to level 2; see also Section 19.4 for an overview and Chapter 24 for a detailed case study of the level-2 toolbox in action in a high-stakes application. Level 3: “Tell me about your computational problem. [. . . listens carefully . . . ] My condolences, your problem is NP-hard.” At level 3, you can quickly recognize NP-hard problems when they arise in practice (at which point you can switch to applying your level-2 skills). You know several famous NP-hard problems and also 8Speaking, as always, about sufficiently nerdy cocktail parties!

Statistics

Uploader

Algorithms Illuminated (Part 4) Algorithms for NP-Hard Problems (Tim Roughgarden)（Z-Library）

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Blog & Notes

Recommended for You

Statistics

Uploader

Algorithms Illuminated (Part 4) Algorithms for NP-Hard Problems (Tim Roughgarden)（Z-Library）

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Reply to Comment

Edit Comment

Blog & Notes

Recommended for You