Practicing Trustworthy Machine Learning Consistent, Transparent, and Fair AI Pipelines (Yada Pruksachatkun, Matthew McAteer etc.) (Z-Library)

Author: Yada Pruksachatkun, Matthew McAteer, Subhabrata Majumdar

教育

With the increasing use of AI in high-stakes domains such as medicine, law, and defense, organizations spend a lot of time and money to make ML models trustworthy. Many books on the subject offer deep dives into theories and concepts. This guide provides a practical starting point to help development teams produce models that are secure, more robust, less biased, and more explainable. Authors Yada Pruksachatkun, Matthew McAteer, and Subhabrata Majumdar translate best practices in the academic literature for curating datasets and building models into a blueprint for building industry-grade trusted ML systems. With this book, engineers and data scientists will gain a much-needed foundation for releasing trustworthy ML applications into a noisy, messy, and often hostile world. You'll learn: • Methods to explain ML models and their outputs to stakeholders • How to recognize and fix fairness concerns and privacy leaks in an ML pipeline • How to develop ML systems that are robust and secure against malicious attacks • Important systemic considerations, like how to manage trust debt and which ML obstacles require human intervention

📄 File Format: PDF

💾 File Size: 34.6 MB

129

Views

Downloads

0.00

Total Donations

📖 Read Online ⬇️ Download

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

📄 Page 1

Pruksa cha tkun, M cA teer, & M a jum d a r Pra cticing Trustw orthy M a chine Lea rning Pra cticing Trustw orthy M a chine Lea rning Practicing Trustworthy Machine Learning Consistent, Transparent, and Fair AI Pipelines Yada Pruksachatkun, Matthew McAteer & Subhabrata Majumdar

📄 Page 2

MACHINE LE ARNING “An excellent practical book with code examples on making AI systems more fair, private, explainable, and robust. Impressively, it has kept up with the ongoing Cambrian explosion of foundation models.” —Kush Varshney Distinguished Research Scientist, Foundations of Trustworthy AI, IBM Research Practicing Trustworthy Machine Learning US $79.99 CAN $99.99 ISBN: 978-1-098-12027-6 Twitter: @oreillymedia linkedin.com/company/oreilly-media youtube.com/oreillymedia With the increasing use of AI in high-stakes domains such as medicine, law, and defense, organizations spend a lot of time and money to make ML models trustworthy. Many books on the subject offer deep dives into theories and concepts. This guide provides a practical starting point to help development teams produce models that are secure, more robust, less biased, and more explainable. Authors Yada Pruksachatkun, Matthew McAteer, and Subhabrata (Subho) Majumdar translate best practices in the academic literature for curating datasets and transforming models into blueprints for building industry-grade trusted ML systems. With this book, engineers and data scientists will gain a much-needed foundation for releasing trustworthy ML applications into a noisy, messy, and often hostile world. You’ll learn: • Methods to explain ML models and their outputs to stakeholders • How to recognize and fix fairness concerns and privacy leaks in an ML pipeline • How to develop ML systems that are robust and secure against malicious attacks • Important systemic considerations, like how to manage trust debt and which ML obstacles require human intervention • The important features behind your model’s decisions Yada Pruksachatkun is a machine learning scientist at Infinitus, a conversational AI startup that automates calls in the healthcare system. Matthew McAteer is the creator of 5cube Labs, an ML consultancy that has worked with over 100 companies in industries ranging from architecture to medicine to agriculture. Subho Majumdar is a machine learning scientist at Twitch, where he leads applied science efforts in responsible ML. Pruksa cha tkun, M cA teer, & M a jum d a r

📄 Page 3

Praise for Practicing Trustworthy Machine Learning An excellent practical book with code examples on making AI systems more fair, private, explainable, and robust. Impressively, it has kept up with the ongoing Cambrian explosion of foundation models. —Kush Varshney, Distinguished Research Scientist, Foundations of Trustworthy AI, IBM Research This book is a valuable and conscientiously written introduction to the increasingly important fields of AI safety, privacy, and interpretability, filled with lots of examples and code snippets to make it of practical use to machine learning practitioners. —Timothy Nguyen, deep learning researcher, host of The Cartesian Cafe podcast This is an impressive book that feels simultaneously foundational and cutting-edge. It is a valuable reference work for data scientists and engineers who want to be confident that the models they release into the world are safe and fair. —Trey Causey, Head of AI Ethics, Indeed

📄 Page 4

(This page has no text content)

📄 Page 5

Yada Pruksachatkun, Matthew McAteer, and Subhabrata Majumdar Practicing Trustworthy Machine Learning Consistent, Transparent, and Fair AI Pipelines Boston Farnham Sebastopol TokyoBeijing

📄 Page 6

978-1-098-12027-6 [LSI] Practicing Trustworthy Machine Learning by Yada Pruksachatkun, Matthew McAteer, and Subhabrata Majumdar Copyright © 2023 Yada Pruksachatkun, Matthew McAteer, and Subhabrata Majumdar. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Nicole Butterfield Development Editor: Sarah Grey Production Editor: Katherine Tozer Copyeditor: Paula L. Fleming Proofreader: Piper Editorial Consulting, LLC Indexer: nSight, Inc. Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea January 2023: First Edition Revision History for the First Release 2023-01-03: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098120276 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Practicing Trustworthy Machine Learn‐ ing, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

📄 Page 7

This book is dedicated to the memory of security researcher, internet privacy activist, and AI ethics researcher Peter Eckersley (1979 to 2022). Thanks for your work on tools such as Let’s Encrypt, Privacy Badger, Certbot, HTTPS Everywhere, SSL Observatory and Panopticlick, for advancing AI ethics in a pragmatic, policy-focused, and actionable way. Thank you also for offering to proofread this book in what unexpectedly turned out to be your last months.

📄 Page 8

(This page has no text content)

📄 Page 9

Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1. Privacy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Attack Vectors for Machine Learning Pipelines 1 Improperly Implemented Privacy Features in ML: Case Studies 2 Case 1: Apple’s CSAM 3 Case 2: GitHub Copilot 4 Case 3: Model and Data Theft from No-Code ML Tools 5 Definitions 6 Definition of Privacy 6 Proxies and Metrics for Privacy 6 Legal Definitions of Privacy 8 k-Anonymity 8 Types of Privacy-Invading Attacks on ML Pipelines 8 Membership Attacks 9 Model Inversion 10 Model Extraction 11 Stealing a BERT-Based Language Model 13 Defenses Against Model Theft from Output Logits 17 Privacy-Testing Tools 19 Methods for Preserving Privacy 20 Differential Privacy 20 Stealing a Differentially Privately Trained Model 21 Further Differential Privacy Tooling 23 Homomorphic Encryption 23 Secure Multi-Party Computation 24 SMPC Example 25 vii

📄 Page 10

Further SMPC Tooling 29 Federated Learning 29 Conclusion 30 2. Fairness and Bias. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Case 1: Social Media 34 Case 2: Triaging Patients in Healthcare Systems 34 Case 3: Legal Systems 35 Key Concepts in Fairness and Fairness-Related Harms 36 Individual Fairness 37 Parity Fairness 37 Calculating Parity Fairness 38 Scenario 1: Language Generation 39 Scenario 2: Image Captioning 43 Fairness Harm Mitigation 45 Mitigation Methods in the Pre-Processing Stage 47 Mitigation Methods in the In-Processing Stage 47 Mitigation Methods in the Post-Processing Stage 49 Fairness Tool Kits 50 How Can You Prioritize Fairness in Your Organization? 52 Conclusion 52 Further Reading 53 3. Model Explainability and Interpretability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Explainability Versus Interpretability 55 The Need for Interpretable and Explainable Models 56 A Possible Trade-off Between Explainability and Privacy 57 Evaluating the Usefulness of Interpretation or Explanation Methods 58 Definitions and Categories 59 “Black Box” 59 Global Versus Local Interpretability 59 Model-Agnostic Versus Model-Specific Methods 59 Interpreting GPT-2 60 Methods for Explaining Models and Interpreting Outputs 68 Inherently Explainable Models 68 Local Model-Agnostic Interpretability Methods 79 Global Model-Agnostic Interpretability Methods 98 Explaining Neural Networks 99 Saliency Mapping 99 Deep Dive: Saliency Mapping with CLIP 100 Adversarial Counterfactual Examples 120 viii | Table of Contents

📄 Page 11

Overcome the Limitations of Interpretability with a Security Mindset 121 Limitations and Pitfalls of Explainable and Interpretable Methods 123 Risks of Deceptive Interpretability 124 Conclusion 125 4. Robustness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Evaluating Robustness 129 Non-Adversarial Robustness 129 Step 1: Apply Perturbations 129 Step 2: Defining and Applying Constraints 133 Deep Dive: Word Substitution with Cosine Similarity Constraints 137 Adversarial Robustness 140 Deep Dive: Adversarial Attacks in Computer Vision 140 Creating Adversarial Examples 144 Improving Robustness 147 Conclusion 148 5. Secure and Trustworthy Data Generation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Case 1: Unsecured AWS Buckets 150 Case 2: Clearview AI Scraping Photos from Social Media 150 Case 3: Improperly Stored Medical Data 151 Issues in Procuring Real-World Data 151 Using the Right Data for the Modeling Goal 152 Consent 152 PII, PHI, and Secrets 152 Proportionality and Sampling Techniques 153 Undescribed Variation 153 Unintended Proxies 153 Failures of External Validity 154 Data Integrity 154 Setting Reasonable Expectations 155 Tools for Addressing Data Collection Issues 155 Synthetically Generated Data 157 DALL·E, GPT-3, and Synthetic Data 157 Improving Pattern Recognition with Synthetic Data 159 Deep Dive: Pre-Training a Model with a Process-Driven Synthetic Dataset 160 Facial Recognition, Pose Detection, and Human-Centric Tasks 161 Object Recognition and Related Tasks 163 Environment Navigation 164 Unity and Unreal Environments 165 Limitations of Synthetic Data in Healthcare 166 Table of Contents | ix

📄 Page 12

Limitations of Synthetic Data in NLP 168 Self-Supervised Learned Models Versus Giant Natural Datasets 168 Repurposing Quality Control Metrics for Security Purposes 169 Conclusion 169 6. More State-of-the-Art Research Questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Making Sense of Improperly Overhyped Research Claims 171 Shallow Human-AI Comparison Antipattern 172 Downplaying the Limitations of the Technique Antipattern 172 Uncritical PR Piece Antipattern 173 Hyperbolic or Just Plain Wrong Antipattern 174 Getting Past These Antipatterns 174 Quantized ML 175 Tooling for Quantized ML 179 Privacy, Bias, Interpretability, and Stability in Quantized ML 180 Diffusion-Based Energy Models 181 Homomorphic Encryption 183 Simulating Federated Learning 188 Quantum Machine Learning 190 Tooling and Resources for Quantum Machine Learning 193 Why QML Will Not Solve Your Regular ML Problems 195 Making the Leap from Theory to Practice 196 7. From Theory to Practice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Part I: Additional Technical Factors 197 Causal Machine Learning 198 Sparsity and Model Compression 203 Uncertainty Quantification 206 Part II: Implementation Challenges 212 Motivating Stakeholders to Develop Trustworthy ML Systems 212 Trust Debts 215 Important Aspects of Trust 221 Evaluation and Feedback 223 Trustworthiness and MLOps 224 Conclusion 227 8. An Ecosystem of Trust. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Tooling 229 LiFT 230 Datasheets 230 Model Cards 232 x | Table of Contents

📄 Page 13

DAG Cards 234 Human-in-the-Loop Steps 236 Oversight Guidelines 236 Stages of Assessment 238 The Need for a Cross-Project Approach 240 MITRE ATLAS 241 Benchmarks 242 AI Incident Database 243 Bug Bounties 243 Deep Dive: Connecting the Dots 244 Data 245 Pre-Processing 247 Model Training 248 Model Inference 249 Trust Components 250 Conclusion 254 A. Synthetic Data Generation Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 B. Other Interpretability and Explainability Tool Kits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Table of Contents | xi

📄 Page 14

(This page has no text content)

📄 Page 15

Preface We live in a world where machine learning (ML) systems are used in increasingly high-stakes domains like medicine, law, and defense. Model decisions can result in economic gains or losses in the millions or billions of dollars. Because of the high- stakes nature of their decisions and consequences, it is important for these ML sys‐ tems to be trustworthy. This can be a problem when the ML systems are not secure, may fail unpredictably, have notable performance disparities across sample groups, and/or struggle to explain their decisions. We wrote this book to help your ML mod‐ els stand up on their own in the real world. Implementing Machine Learning in Production If you’re reading this book, you are probably already aware of the incredibly outsized importance of ML. Regardless of the fields of application, ML techniques touch all of our lives. Google Brain cofounder Andrew Ng was not exaggerating when he described AI as “the new electricity”. After all, what we have on our hands could best be described as a universal function approximator. Much like electricity, ML can be dangerous if not handled properly. Like a discharge from a high-voltage wire collid‐ ing with a mylar balloon, cases of ML failure can be unexpected and scary. Deploying ML applications in the real world is quite different from working on mod‐ els in closed environments. Academic datasets often do not carry the full variation of real-world data. Data that our models interact with in the future may not resemble the data of the past, especially if someone cut corners in getting this data. It could include all sorts of biases that the model could learn from, thereby putting whoever deployed it in a hairy ethical and/or legal situation. The situation may be made worse by the fact that you cannot fully explain why your ML model is behaving the way it does. Even if all goes well on those fronts, you’re not out of the woods yet. Hackers are getting more sophisticated every year and may eventually figure out how to steal sensitive data just by querying your deployed model. xiii

📄 Page 16

1 Ashish Vaswani et al., “Attention Is All You Need”, NeurIPS Proceedings (2017). 2 Kai Han et al., “A Survey on Vision Transformer”, IEEE Transactions on Pattern Analysis and Machine Intelligence (2022). The prognosis isn’t all doom and gloom, though. There are well-studied best practices for curating datasets, both for real-world data and synthetic data. There are plenty of ways to measure just how different new incoming data is from the data you already have. Just as there are ways of spotting and fixing bias in ML, there are new ways of making your ML pipelines explainable and interpretable in general. As for security and robustness, some of the largest ML companies in the world are releasing tool kits for helping you obscure sensitive model details from nosy outsiders. All these ways of repairing the metaphorical wiring of your ML pipeline are discussed in this book, from classic solutions to the cutting edge. The Transformer Convergence In the late 2010s and early 2020s, not long before we began writing this book, a deep learning model architecture called “transformer” had been making waves in the natu‐ ral language processing (NLP) space. Over the course of this writing, the pace of transformer adoption has only accelerated. This approach is quickly becoming a stan‐ dard tool in computer vision, tabular data processing, and even reinforcement learn‐ ing. It’s a huge departure from how deep learning worked in the early 2010s, when each task and domain had such unique and distinct architectures that it was hard for a computer vision expert to fully understand NLP research (and it was often difficult for NLP researchers to understand computer vision methods in meaningful depth as well). The transformer is an ML architecture that first appeared in the 2017 paper “Atten‐ tion Is All You Need.”1 In previous neural network approaches, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the system would first focus on local patches of input data and then build up to the whole. By contrast, with a transformer model, every element of the input data connects (or pays attention to) every other element. This approach means that the transformer can make sense of the entire dataset it’s trained on. This ability to make connections between data points across an entire dataset is key to the transformer’s usefulness. Transformer models have become front-runners on tasks such as question answering, text prediction, and translation. More recently, this has extended beyond NLP to vision domains like image classification.2 This conver‐ gence around transformers is a recent phenomenon, but it’s clear that it will continue to grow into the future. xiv | Preface

📄 Page 17

3 Matthew McAteer’s blog provides examples of companies building on top of GPT-3. 4 Takeshi Kojima et al., “Large Language Models Are Zero-Shot Reasoners”, arXiv preprint (2022). 5 See Antonia Creswell et al. (who are affiliated with DeepMind) on using prompting for interpretable compos‐ able reasoning: “Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning”, arXiv preprint (2022). 6 This isn’t even getting into the possible consequences of talking about prompt engineering in a book that’s accessible on the internet and thus might be used as part of the training data in a future large language model like a GPT-3 successor. While transformers should not be used for every single problem (for example, there are plenty of circumstances where less computational- and memory-intensive meth‐ ods work best), we make transformer-based models a focus of this book given the recent trend in this area. An Explosion of Large and Highly Capable ML Models Not only have transformers become ubiquitous, but they’ve also been used to put into the hands of many people AI systems whose capabilities would have seemed like sci‐ ence fiction just a decade ago. In 2019, OpenAI released GPT-3, a language model that can generate text that is in many cases indistinguishable from human-written text. Even as companies are building their products around these models,3 we are still discovering new capabilities. For example, in 2022, it was discovered that one could greatly boost GPT-3’s performance on reasoning benchmarks like MultiArith (jump‐ ing from 17.7% to 78.7% accuracy) and GSM8K (jumping from 10.4% to 40.7% accu‐ racy). How was this amazing leap in capability achieved? It simply involved prompting GPT-3 to complete an answer that was prefilled with "Let's think step by step" before each answer.4 The strangeness does not stop there, as this prompting can cause language models to output reasoning steps that may not necessarily arrive at an answer at all (you need further prompting and querying to get the actual answer).5,6 Another notable ML model that came about at the time we were writing this book was StableDiffusion, a text-to-image model that can generate images from text descriptions. It was trained in the same manner as text-to-image models like Open‐ AI’s DALL·E 2, Google’s Imagen, Google’s Parti, and MidJourney and thus had roughly similar quality of output. Unlike these other models, the underlying code and the full model weights were released to the public. The release of this capable model was a big deal for the ML safety community. It went against the ethos of keeping highly capable ML models private until their consequences and safety can be evalu‐ ated. In the case of StableDiffusion, the authors released a variety of harm-reduction Preface | xv

📄 Page 18

7 See Stability.ai’s announcement on Twitter of their Deep Fake detection initiative using the new OpenCLIP models among other techniques. 8 Beyond text-to-image models like StableDiffusion, other organizations are following a similar approach in releasing large models. Meta AI released the 175-billion parameter Open Pretrained Transformer, comparable in size to GPT-3, as open source. 9 This also has the bonus effect of letting would-be defectors know that they are defecting, and it increases the reputational cost of implementing an unsafe AI system while decreasing the cost of reducing AI risk. 10 For example, in 2021 DeepMind’s ethics team published the paper “Ethical and Social Risks of Harm from Language Models”, and OpenAI updated their stance on AI safety on their blog in March 2022. tools at the same time the highly capable model was released.7,8 While this best prac‐ tice should be encouraged, it also highlights how underresourced a lot of ML safety initiatives were, even for much lower-stakes ML models and pipelines. After all, we’ve seen many similar new image/language models pouring out of com‐ peting companies and teams like Google, DeepMind, OpenAI, and Microsoft. Since these projects are being built in parallel, and with comparable results, the generation of new ideas is not a bottleneck. In some cases, it might suggest that progress won’t be slowed down by just one team or organization opting out, which creates perverse incentives. One team might decide to get ahead by not imposing limitations on its text or image generation tool. While teams at larger organizations have been slow to develop products because of these safety concerns, it’s also hard to stop an engineer from one of these teams from defecting to a startup that wants to move much faster in making a product. Since these similar projects are being developed in parallel, it seems secrecy no longer offers as much protection as it once did. As such, it seems like one of the most promising ways to make sure safety is consid‐ ered is for the organizations to be as public as possible about both their perception of safety risks and their proposed solutions for those risks.9 It’s for this reason that we wrote this book. Why We Wrote This Book As people who have both conducted research in ML and worked on ML systems that have been successfully deployed, we’ve noticed that the gap between building an ini‐ tial ML model for a static dataset and deployment is large. A major part of this gap is in lack of trustworthiness. There are so many ways in which ML models that work in development can fail in production. Many large companies have dedicated responsi‐ ble AI and safety teams to analyze the potential risks and consequences of both their current and potential future ML systems.10 Unfortunately, the vast majority of teams and companies using ML do not have the bandwidth to do this. Even in cases where such teams exist, they are often underresourced, and the model development cycles xvi | Preface

📄 Page 19

may be too fast for the safety team to keep up with for fear that a competitor will release a similar model first. We wrote this book to lower the barrier to entry for understanding how to create ML models that are trustworthy. While a lot of titles already exist on this subject, we wanted to create a resource that was accessible to people without a background in machine learning research that teaches frameworks and ways to think about trust‐ worthiness, as well as some methods to evaluate and improve the trustworthiness of models. This includes: • Code blocks to copy and paste into your own projects • Lists of links to open source projects and resources • Links to in-depth code tutorials, many of which can be explored in-browser While there’s no replacement for experience, in order to get experience, you need to know where to start in the first place. This book is meant to provide that much- needed foundation for releasing your machine learning applications into the noisy, messy, sometimes hostile real world. This work stands on the shoulders of countless other researchers, engineers, and more—we hope this work will help translate some of that work for people working to deploy ML systems. Who This Book Is For This book is written for anyone who is currently working with machine learning models and wants to be sure that the fruits of their labor will not cause unintended harm when released into the real world. The primary audience of the book is engineers and data scientists who have some familiarity with machine learning. Parts of the book should be accessible to non- engineers, such as product managers and executives with a conceptual understanding of ML. Some of you may be building ML systems that make higher-stakes decisions than you encountered in your previous job or in academia. We assume you are famil‐ iar with the very basics of deep learning and with Python for the code samples. An initial reading will allow engineers to gain a solid understanding of trustworthi‐ ness and how it may apply to the ML systems you are using. As you continue on your ML career, you can refer back and adapt code snippets from the book to evaluate and ensure aspects of trustworthiness in your systems. Preface | xvii

📄 Page 20

11 Allison Whitten, “How Computationally Complex Is a Single Neuron?”, Quanta Magazine, September 2, 2021. This article summarizes the results of the paper by David Beniaguev et al., “Single Cortical Neurons Are Deep Artificial Neural Networks”, Neuron (2021). 12 Cohere Team, “Best Practices for Deploying Language Models”, co:here, June 2, 2022. 13 Nick Bostrom’s “Superintelligence” outlines scenarios in which such a system could emerge from any of the various AI research labs and then grow beyond the ability of humans to contain it. 14 Popular internet essayist Gwern wrote “It Looks Like You’re Trying to Take Over the World”, a short story designed to help readers imagine a scenario where AI research not too far from the current state of the art could cause a catastrophe. AI Safety and Alignment There’s a big field of study focused on the problems of AI safety and AI alignment. AI alignment is the problem of how to make AI systems that do what humans want without unintended side effects. This is a subset of AI safety, which deals with miti‐ gating a far wider-reaching space of possible problems with AI systems. These prob‐ lems range from perpetuating societal biases without possibility of correction, to being used by humans in domains like warfare or fraud or cybercrime, to exhibiting behaviors that no human of any culture or affiliation would ever want. AI alignment is seen as the solution to AI safety risks because it involves getting the AI to fully understand and reliably respect human values. There is truly an enormous amount of writing about trustworthy machine learning from a theoretical and/or aca‐ demic perspective. One problem is that a lot of this writing tries to clearly define terms from psychology (e.g., intent, desire, goal, and motivation) and philosophy (e.g., value system and util‐ ity), but few of these definitions would be useful to an engineer who is actually tasked with building the AI system. Humans might one day build an AI system that truly mimics the human brain down to the level of neurons and synapses, and in that sce‐ nario such philosophical and psychological descriptors would be useful. However, speaking from one of the authors’ prior experiences as a wet lab neuroscientist, modern neural networks have very little in common with the human brain at all. Real living neurons are not like logic gates, usually requiring something like 10,000 cou‐ pled and nonlinear differential equations to describe their behavior. Simulating a sin‐ gle neuron is usually a task for an entire dedicated artificial neural network rather than just a single weight and bias.11 It’s not clear that we can ever arrive at a way to prove mathematically that our AI system won’t ever cause harm. Still, as organiza‐ tions like Cohere, OpenAI, and Al21 Labs have shown,12 there’s still a lot that can be done to preempt common problems and institute best practices. Another challenge is that a lot of AI safety literature focuses on hypothetical future scenarios like artificial general intelligence (AGI) and self-improving AI systems.13,14 This isn’t completely removed from the real world. During the writing of this book, xviii | Preface

The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00

Total Amount (¥)

Donation Count

← Back to List