Statistics
18
Views
0
Downloads
0
Donations
Uploader

高宏飞

Shared on 2025-12-22
Support
Share

AuthorHala Nelson

All the math we need to get into AI. Math and AI made easy... Many industries are eager to integrate AI and data-driven technologies into their systems and operations. But to build truly successful AI systems, you need a firm grasp of the underlying mathematics. This comprehensive guide bridges the gap in presentation between the potential and applications of AI and its relevant mathematical foundations. In an immersive and conversational style, the book surveys the mathematics necessary to thrive in the AI field, focusing on real-world applications and state-of-the-art models, rather than on dense academic theory. You'll explore topics such as regression, neural networks, convolution, optimization, probability, graphs, random walks, Markov processes, differential equations, and more within an exclusive AI context geared toward computer vision, natural language processing, generative models, reinforcement learning, operations research, and automated systems. With a broad audience in mind, including engineers, data scientists, mathematicians, scientists, and people early in their careers, the book helps build a solid foundation for success in the AI and math fields. You'll be able to: Comfortably speak the languages of AI, machine learning, data science, and mathematics Unify machine learning models and natural language models under one mathematical structure Handle graph and network data with ease Explore real data, visualize space transformations, reduce dimensions, and process images Decide on which models to use for different data-driven projects Explore the various implications and limitations of AI

Tags
No tags
ISBN: 1098107632
Publisher: O'Reilly Media
Publish Year: 2023
Language: 英文
Pages: 605
File Format: PDF
File Size: 27.5 MB
Support Statistics
¥.00 · 0times
Text Preview (First 20 pages)
Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

(This page has no text content)
AI “Technology and AI markets are like a river, where some parts are moving faster than others. Successfully applying AI requires the skill of assessing the direction of the flow and complementing it with a strong foundation, which this book enables in an engaging, delightful, and inclusive way. Hala has made math fun for a spectrum of participants in the AI-enabled future!” —Adri Purkayastha Group Head, AI Operational Risk and Digital Risk Analytics, BNP Paribas Essential Math for AI US $79.99 CAN $99.99 ISBN: 978-1-098-10763-5 Twitter: @oreillymedia linkedin.com/company/oreilly-media youtube.com/oreillymedia Many sectors and industries are eager to integrate AI and data-driven technologies into their systems and operations. But to build truly successful AI systems, you need a firm grasp of the underlying mathematics. This comprehensive guide bridges the current gap in presentation between the unlimited potential and applications of AI and its relevant mathematical foundations. Rather than discussing dense academic theory, author Hala Nelson surveys the mathematics necessary to thrive in the AI field, focusing on real-world applications and state-of- the-art models. You’ll explore topics such as regression, neural networks, convolution, optimization, probability, Markov processes, differential equations, and more within an exclusive AI context. Engineers, data scientists, mathematicians, and scientists will gain a solid foundation for success in the AI and math fields. You’ll be able to: • Comfortably speak the languages of AI, machine learning, data science, and mathematics • Unify machine learning models and natural language models under one mathematical structure • Handle graph and network data with ease • Explore real data, visualize space transformations, reduce dimensions, and process images • Decide on which models to use for different data-driven projects • Explore the various implications and limitations of AI Hala Nelson is an associate professor of mathematics at James Madison University who specializes in mathematical modeling and consults for the public sector on emergency and infrastructure services. She has a PhD in mathematics from the Courant Institute of Mathematical Sciences at New York University.
Praise for Essential Math for AI Technology and AI markets are like a river, where some parts are moving faster than others. Successfully applying AI requires the skill of assessing the direction of the flow and complementing it with a strong foundation, which this book enables, in an engaging, delightful, and inclusive way. Hala has made math fun for a spectrum of participants in the AI-enabled future! —Adri Purkayastha, Group Head, AI Operational Risk and Digital Risk Analytics, BNP Paribas Texts on artificial intelligence are usually either technical manuscripts written by experts for other experts, or cursory, math-free introductions catered to general audiences. This book takes a refreshing third path by introducing the mathematical foundations for readers in business, data, and similar fields without advanced mathematics degrees. The author weaves elegant equations and pithy observations throughout, all the while asking the reader to consider the very serious implications artificial intelligence has on society. I recommend Essential Math for AI to anyone looking for a rigorous treatment of AI fundamentals viewed through a practical lens. —George Mount, Data Analyst and Educator Hala has done a great job in explaining crucial mathematical concepts. This is a must-read for every serious machine learning practitioner. You’d love the field more once you go through the book. —Umang Sharma, Senior Data Scientist and Author To understand artificial intelligence, one needs to understand the relationship between math and AI. Dr. Nelson made this easy by giving us the foundation on which the symbiotic relationship between the two disciplines is built. —Huan Nguyen, Rear Admiral (Ret.), Cyber Engineering, NAVSEA
(This page has no text content)
Hala Nelson Essential Math for AI Next-Level Mathematics for Efficient and Successful AI Systems Boston Farnham Sebastopol TokyoBeijing
978-1-098-10763-5 [LSI] Essential Math for AI by Hala Nelson Copyright © 2023 Hala Nelson. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Aaron Black Development Editor: Angela Rufino Production Editor: Kristen Brown Copyeditor: Sonia Saruba Proofreader: JM Olejarz Indexer: nSight, Inc. Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea January 2023: First Edition Revision History for the First Edition 2023-01-04: First Release 2023-02-03: Second Release See http://oreilly.com/catalog/errata.csp?isbn=9781098107635 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Essential Math for AI, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii 1. Why Learn the Mathematics of AI?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 What Is AI? 2 Why Is AI So Popular Now? 3 What Is AI Able to Do? 3 An AI Agent’s Specific Tasks 4 What Are AI’s Limitations? 6 What Happens When AI Systems Fail? 8 Where Is AI Headed? 8 Who Are the Current Main Contributors to the AI Field? 10 What Math Is Typically Involved in AI? 10 Summary and Looking Ahead 11 2. Data, Data, Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Data for AI 14 Real Data Versus Simulated Data 16 Mathematical Models: Linear Versus Nonlinear 16 An Example of Real Data 18 An Example of Simulated Data 21 Mathematical Models: Simulations and AI 25 Where Do We Get Our Data From? 27 The Vocabulary of Data Distributions, Probability, and Statistics 29 Random Variables 30 Probability Distributions 30 Marginal Probabilities 31 v
The Uniform and the Normal Distributions 31 Conditional Probabilities and Bayes’ Theorem 31 Conditional Probabilities and Joint Distributions 31 Prior Distribution, Posterior Distribution, and Likelihood Function 32 Mixtures of Distributions 32 Sums and Products of Random Variables 32 Using Graphs to Represent Joint Probability Distributions 33 Expectation, Mean, Variance, and Uncertainty 33 Covariance and Correlation 33 Markov Process 34 Normalizing, Scaling, and/or Standardizing a Random Variable or Data Set 34 Common Examples 34 Continuous Distributions Versus Discrete Distributions (Density Versus Mass) 35 The Power of the Joint Probability Density Function 37 Distribution of Data: The Uniform Distribution 38 Distribution of Data: The Bell-Shaped Normal (Gaussian) Distribution 40 Distribution of Data: Other Important and Commonly Used Distributions 43 The Various Uses of the Word “Distribution” 47 A/B Testing 48 Summary and Looking Ahead 48 3. Fitting Functions to Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Traditional and Very Useful Machine Learning Models 53 Numerical Solutions Versus Analytical Solutions 55 Regression: Predict a Numerical Value 56 Training Function 58 Loss Function 60 Optimization 71 Logistic Regression: Classify into Two Classes 85 Training Function 85 Loss Function 86 Optimization 88 Softmax Regression: Classify into Multiple Classes 88 Training Function 90 Loss Function 92 Optimization 92 Incorporating These Models into the Last Layer of a Neural Network 93 Other Popular Machine Learning Techniques and Ensembles of Techniques 94 Support Vector Machines 94 Decision Trees 98 vi | Table of Contents
Random Forests 107 k-means Clustering 108 Performance Measures for Classification Models 109 Summary and Looking Ahead 110 4. Optimization for Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 The Brain Cortex and Artificial Neural Networks 113 Training Function: Fully Connected, or Dense, Feed Forward Neural Networks 115 A Neural Network Is a Computational Graph Representation of the Training Function 117 Linearly Combine, Add Bias, Then Activate 117 Common Activation Functions 122 Universal Function Approximation 125 Approximation Theory for Deep Learning 131 Loss Functions 131 Optimization 133 Mathematics and the Mysterious Success of Neural Networks 134 Gradient Descent ω i + 1 = ω i − η∇L ω i 135 Explaining the Role of the Learning Rate Hyperparameter η 137 Convex Versus Nonconvex Landscapes 140 Stochastic Gradient Descent 143 Initializing the Weights ω 0 for the Optimization Process 144 Regularization Techniques 145 Dropout 145 Early Stopping 146 Batch Normalization of Each Layer 146 Control the Size of the Weights by Penalizing Their Norm 147 Penalizing the l2 Norm Versus Penalizing the l1 Norm 150 Explaining the Role of the Regularization Hyperparameter α 151 Hyperparameter Examples That Appear in Machine Learning 152 Chain Rule and Backpropagation: Calculating ∇L ω i 153 Backpropagation Is Not Too Different from How Our Brain Learns 154 Why Is It Better to Backpropagate? 155 Backpropagation in Detail 155 Assessing the Significance of the Input Data Features 157 Summary and Looking Ahead 158 Table of Contents | vii
5. Convolutional Neural Networks and Computer Vision. . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Convolution and Cross-Correlation 163 Translation Invariance and Translation Equivariance 167 Convolution in Usual Space Is a Product in Frequency Space 167 Convolution from a Systems Design Perspective 168 Convolution and Impulse Response for Linear and Translation Invariant Systems 169 Convolution and One-Dimensional Discrete Signals 171 Convolution and Two-Dimensional Discrete Signals 172 Filtering Images 174 Feature Maps 178 Linear Algebra Notation 179 The One-Dimensional Case: Multiplication by a Toeplitz Matrix 182 The Two-Dimensional Case: Multiplication by a Doubly Block Circulant Matrix 182 Pooling 183 A Convolutional Neural Network for Image Classification 184 Summary and Looking Ahead 186 6. Singular Value Decomposition: Image Processing, Natural Language Processing, and Social Media. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Matrix Factorization 188 Diagonal Matrices 191 Matrices as Linear Transformations Acting on Space 193 Action of A on the Right Singular Vectors 194 Action of A on the Standard Unit Vectors and the Unit Square Determined by Them 195 Action of A on the Unit Circle 196 Breaking Down the Circle-to-Ellipse Transformation According to the Singular Value Decomposition 197 Rotation and Reflection Matrices 198 Action of A on a General Vector x 199 Three Ways to Multiply Matrices 200 The Big Picture 201 The Condition Number and Computational Stability 203 The Ingredients of the Singular Value Decomposition 204 Singular Value Decomposition Versus the Eigenvalue Decomposition 204 Computation of the Singular Value Decomposition 206 Computing an Eigenvector Numerically 207 The Pseudoinverse 208 viii | Table of Contents
Applying the Singular Value Decomposition to Images 209 Principal Component Analysis and Dimension Reduction 212 Principal Component Analysis and Clustering 214 A Social Media Application 214 Latent Semantic Analysis 215 Randomized Singular Value Decomposition 216 Summary and Looking Ahead 216 7. Natural Language and Finance AI: Vectorization and Time Series. . . . . . . . . . . . . . . . . 219 Natural Language AI 222 Preparing Natural Language Data for Machine Processing 223 Statistical Models and the log Function 226 Zipf ’s Law for Term Counts 226 Various Vector Representations for Natural Language Documents 227 Term Frequency Vector Representation of a Document or Bag of Words 227 Term Frequency-Inverse Document Frequency Vector Representation of a Document 228 Topic Vector Representation of a Document Determined by Latent Semantic Analysis 228 Topic Vector Representation of a Document Determined by Latent Dirichlet Allocation 232 Topic Vector Representation of a Document Determined by Latent Discriminant Analysis 233 Meaning Vector Representations of Words and of Documents Determined by Neural Network Embeddings 234 Cosine Similarity 241 Natural Language Processing Applications 243 Sentiment Analysis 243 Spam Filter 244 Search and Information Retrieval 244 Machine Translation 246 Image Captioning 247 Chatbots 247 Other Applications 247 Transformers and Attention Models 247 The Transformer Architecture 248 The Attention Mechanism 251 Transformers Are Far from Perfect 255 Convolutional Neural Networks for Time Series Data 255 Recurrent Neural Networks for Time Series Data 257 How Do Recurrent Neural Networks Work? 258 Table of Contents | ix
Gated Recurrent Units and Long Short-Term Memory Units 260 An Example of Natural Language Data 261 Finance AI 261 Summary and Looking Ahead 262 8. Probabilistic Generative Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 What Are Generative Models Useful For? 264 The Typical Mathematics of Generative Models 265 Shifting Our Brain from Deterministic Thinking to Probabilistic Thinking 268 Maximum Likelihood Estimation 270 Explicit and Implicit Density Models 272 Explicit Density-Tractable: Fully Visible Belief Networks 273 Example: Generating Images via PixelCNN and Machine Audio via WaveNet 273 Explicit Density-Tractable: Change of Variables Nonlinear Independent Component Analysis 276 Explicit Density-Intractable: Variational Autoencoders Approximation via Variational Methods 277 Explicit Density-Intractable: Boltzman Machine Approximation via Markov Chain 279 Implicit Density-Markov Chain: Generative Stochastic Network 279 Implicit Density-Direct: Generative Adversarial Networks 280 How Do Generative Adversarial Networks Work? 281 Example: Machine Learning and Generative Networks for High Energy Physics 283 Other Generative Models 285 Naive Bayes Classification Model 286 Gaussian Mixture Model 287 The Evolution of Generative Models 288 Hopfield Nets 290 Boltzmann Machine 290 Restricted Boltzmann Machine (Explicit Density and Intractable) 291 The Original Autoencoder 292 Probabilistic Language Modeling 293 Summary and Looking Ahead 295 9. Graph Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Graphs: Nodes, Edges, and Features for Each 299 Example: PageRank Algorithm 302 Inverting Matrices Using Graphs 307 Cayley Graphs of Groups: Pure Algebra and Parallel Computing 308 x | Table of Contents
Message Passing Within a Graph 309 The Limitless Applications of Graphs 310 Brain Networks 311 Spread of Disease 312 Spread of Information 312 Detecting and Tracking Fake News Propagation 312 Web-Scale Recommendation Systems 314 Fighting Cancer 314 Biochemical Graphs 315 Molecular Graph Generation for Drug and Protein Structure Discovery 316 Citation Networks 316 Social Media Networks and Social Influence Prediction 316 Sociological Structures 317 Bayesian Networks 317 Traffic Forecasting 317 Logistics and Operations Research 318 Language Models 318 Graph Structure of the Web 320 Automatically Analyzing Computer Programs 321 Data Structures in Computer Science 321 Load Balancing in Distributed Networks 322 Artificial Neural Networks 323 Random Walks on Graphs 324 Node Representation Learning 326 Tasks for Graph Neural Networks 327 Node Classification 327 Graph Classification 328 Clustering and Community Detection 329 Graph Generation 329 Influence Maximization 329 Link Prediction 330 Dynamic Graph Models 330 Bayesian Networks 331 A Bayesian Network Represents a Compactified Conditional Probability Table 333 Making Predictions Using a Bayesian Network 334 Bayesian Networks Are Belief Networks, Not Causal Networks 334 Keep This in Mind About Bayesian Networks 335 Chains, Forks, and Colliders 336 Given a Data Set, How Do We Set Up a Bayesian Network for the Involved Variables? 337 Table of Contents | xi
Graph Diagrams for Probabilistic Causal Modeling 338 A Brief History of Graph Theory 340 Main Considerations in Graph Theory 341 Spanning Trees and Shortest Spanning Trees 341 Cut Sets and Cut Vertices 342 Planarity 342 Graphs as Vector Spaces 343 Realizability 343 Coloring and Matching 344 Enumeration 344 Algorithms and Computational Aspects of Graphs 344 Summary and Looking Ahead 345 10. Operations Research. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 No Free Lunch 349 Complexity Analysis and O() Notation 350 Optimization: The Heart of Operations Research 353 Thinking About Optimization 356 Optimization: Finite Dimensions, Unconstrained 357 Optimization: Finite Dimensions, Constrained Lagrange Multipliers 357 Optimization: Infinite Dimensions, Calculus of Variations 360 Optimization on Networks 365 Traveling Salesman Problem 365 Minimum Spanning Tree 366 Shortest Path 367 Max-Flow Min-Cut 368 Max-Flow Min-Cost 369 The Critical Path Method for Project Design 369 The n-Queens Problem 370 Linear Optimization 371 The General Form and the Standard Form 372 Visualizing a Linear Optimization Problem in Two Dimensions 373 Convex to Linear 374 The Geometry of Linear Optimization 377 The Simplex Method 379 Transportation and Assignment Problems 386 Duality, Lagrange Relaxation, Shadow Prices, Max-Min, Min-Max, and All That 386 Sensitivity 401 Game Theory and Multiagents 402 Queuing 404 xii | Table of Contents
Inventory 405 Machine Learning for Operations Research 405 Hamilton-Jacobi-Bellman Equation 406 Operations Research for AI 407 Summary and Looking Ahead 407 11. Probability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Where Did Probability Appear in This Book? 412 What More Do We Need to Know That Is Essential for AI? 415 Causal Modeling and the Do Calculus 415 An Alternative: The Do Calculus 417 Paradoxes and Diagram Interpretations 420 Monty Hall Problem 420 Berkson’s Paradox 422 Simpson’s Paradox 422 Large Random Matrices 424 Examples of Random Vectors and Random Matrices 424 Main Considerations in Random Matrix Theory 427 Random Matrix Ensembles 429 Eigenvalue Density of the Sum of Two Large Random Matrices 430 Essential Math for Large Random Matrices 430 Stochastic Processes 432 Bernoulli Process 433 Poisson Process 433 Random Walk 434 Wiener Process or Brownian Motion 435 Martingale 435 Levy Process 436 Branching Process 436 Markov Chain 436 Itô’s Lemma 437 Markov Decision Processes and Reinforcement Learning 438 Examples of Reinforcement Learning 438 Reinforcement Learning as a Markov Decision Process 439 Reinforcement Learning in the Context of Optimal Control and Nonlinear Dynamics 441 Python Library for Reinforcement Learning 441 Theoretical and Rigorous Grounds 441 Which Events Have a Probability? 442 Can We Talk About a Wider Range of Random Variables? 443 A Probability Triple (Sample Space, Sigma Algebra, Probability Measure) 443 Table of Contents | xiii
Where Is the Difficulty? 444 Random Variable, Expectation, and Integration 445 Distribution of a Random Variable and the Change of Variable Theorem 446 Next Steps in Rigorous Probability Theory 447 The Universality Theorem for Neural Networks 448 Summary and Looking Ahead 448 12. Mathematical Logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Various Logic Frameworks 452 Propositional Logic 452 From Few Axioms to a Whole Theory 455 Codifying Logic Within an Agent 456 How Do Deterministic and Probabilistic Machine Learning Fit In? 456 First-Order Logic 457 Relationships Between For All and There Exist 458 Probabilistic Logic 460 Fuzzy Logic 460 Temporal Logic 461 Comparison with Human Natural Language 462 Machines and Complex Mathematical Reasoning 462 Summary and Looking Ahead 463 13. Artificial Intelligence and Partial Differential Equations. . . . . . . . . . . . . . . . . . . . . . . . . 465 What Is a Partial Differential Equation? 466 Modeling with Differential Equations 467 Models at Different Scales 468 The Parameters of a PDE 468 Changing One Thing in a PDE Can Be a Big Deal 469 Can AI Step In? 471 Numerical Solutions Are Very Valuable 472 Continuous Functions Versus Discrete Functions 472 PDE Themes from My Ph.D. Thesis 474 Discretization and the Curse of Dimensionality 477 Finite Differences 478 Finite Elements 484 Variational or Energy Methods 489 Monte Carlo Methods 490 Some Statistical Mechanics: The Wonderful Master Equation 493 Solutions as Expectations of Underlying Random Processes 495 Transforming the PDE 495 Fourier Transform 495 xiv | Table of Contents
Laplace Transform 498 Solution Operators 499 Example Using the Heat Equation 499 Example Using the Poisson Equation 501 Fixed Point Iteration 503 AI for PDEs 509 Deep Learning to Learn Physical Parameter Values 509 Deep Learning to Learn Meshes 510 Deep Learning to Approximate Solution Operators of PDEs 512 Numerical Solutions of High-Dimensional Differential Equations 519 Simulating Natural Phenomena Directly from Data 520 Hamilton-Jacobi-Bellman PDE for Dynamic Programming 522 PDEs for AI? 528 Other Considerations in Partial Differential Equations 528 Summary and Looking Ahead 530 14. Artificial Intelligence, Ethics, Mathematics, Law, and Policy. . . . . . . . . . . . . . . . . . . . . 531 Good AI 533 Policy Matters 534 What Could Go Wrong? 536 From Math to Weapons 536 Chemical Warfare Agents 537 AI and Politics 538 Unintended Outcomes of Generative Models 539 How to Fix It? 539 Addressing Underrepresentation in Training Data 539 Addressing Bias in Word Vectors 540 Addressing Privacy 540 Addressing Fairness 541 Injecting Morality into AI 542 Democratization and Accessibility of AI to Nonexperts 543 Prioritizing High Quality Data 543 Distinguishing Bias from Discrimination 544 The Hype 545 Final Thoughts 546 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549 Table of Contents | xv
(This page has no text content)
Preface Why I Wrote This Book AI is built on mathematical models. We need to know how. I wrote this book in purely colloquial language, leaving most of the technical details out. It is a math book about AI with very few mathematical formulas and equations, no theorems, no proofs, and no coding. My goal is to not keep this important knowl‐ edge in the hands of the very few elite, and to attract more people to technical fields. I believe that many people get turned off by math before they ever get a chance to know that they might love it and be naturally good at it. This also happens in college or in graduate school, where many students switch their majors from math, or start a Ph.D. and never finish it. The reason is not that they do not have the ability, but that they saw no motivation or an end goal for learning torturous methods and techniques that did not seem to transfer to anything useful in their lives. It is like going to a strenuous mental gym every day only for the sake of going there. No one even wants to go to a real gym every day (this is a biased statement, but you get the point). In math, formalizing objects into functions, spaces, measure spaces, and entire mathematical fields comes after motivation, not before. Unfortunately, it gets taught in reverse, with formality first and then, if we are lucky, some motivation. The most beautiful thing about math is that it has the expressive ability to connect seemingly disparate things together. A field as big and as consequential as AI not only builds on math, as that is a given; it also needs the binding ability that only math can provide in order to tell its big story concisely. In this book I will extract the math required for AI in a way that does not deviate at all from the real-life AI application in mind. It is infeasible to go through existing tools in detail and not fall into an encyclopedic and overwhelming treatment. What I do instead is try to teach you how to think about these tools and view them from above, as a means to an end that we can tweak and adjust when we need to. I hope that you will get out of this book a way of seeing how things relate to each other and why we develop or use certain methods xvii
among others. In a way, this book provides a platform that launches you to whatever area you find interesting or want to specialize in. Another goal of this book is to democratize mathematics, and to build more confi‐ dence to ask about how things work. Common answers such as “It’s complicated mathematics,” “It’s complicated technology,” or “It’s complex models,” are no longer satisfying, especially since the technologies that build on mathematical models cur‐ rently affect every aspect of our lives. We do not need to be experts in every field in mathematics (no one is) in order to understand how things are built and why they operate the way they do. There is one thing about mathematical models that everyone needs to know: they always give an answer. They always spit out a number. A model that is vetted, validated, and backed with sound theory gives an answer. Also, a model that is complete trash gives an answer. Both compute mathematical functions. Saying that our decisions are based on mathematical models and algorithms does not make them sacred. What are the models built on? What are their assumptions? Limitations? Data they were trained on? Tested on? What variables did they take into account? And what did they leave out? Do they have a feedback loop for improvement, ground truths to compare to and improve on? Is there any theory backing them up? We need to be transparent with this information when the models are ours, and ask for it when the models are deciding our livelihoods for us. The unorthodox organization of the topics in this book is intentional. I wanted to avoid getting stuck in math details before getting to the applicable stuff. My stand on this is that we do not ever need to dive into background material unless we happen to be personally practicing something, and that background material becomes an unfulfilled gap in our knowledge that is stopping us from making progress. Only then it is worth investing serious time to learn the intricate details of things. It is much more important to see how it all ties together and where everything fits. In other words, this book provides a map for how everything between math and AI interacts nicely together. I also want to make a note to newcomers about the era of large data sets. Before working with large data, real or simulated, structured or unstructured, we might have taken computers and the internet for granted. If we came up with a model or needed to run analytics on small and curated data sets, we might have assumed that our machine’s hardware would handle the computations, or that the internet would just give more curated data when we needed it, or more information about similar models. The reality and limitations to access data, errors in the data, errors in the outputs of queries, hardware limitations, storage, data flow between devices, and vectorizing unstructured data such as natural language or images and movies hits us really hard. That is when we start getting into parallel computing, cloud computing, data management, databases, data structures, data architectures, and data engineering in order to understand the compute infrastructure that allows us to run our models. What kind of infrastructure do we have? How is it structured? How did it xviii | Preface
The above is a preview of the first 20 pages. Register to read the complete e-book.