📄 Page
1
(This page has no text content)
📄 Page
2
i Mathematical Foundations for Deep Learning Mathematical Foundations for Deep Learning bridges the gap between theoretical mathematics and practical applications in artificial intelligence (AI). This guide delves into the fundamental mathem- atical concepts that power modern deep learning, equipping readers with the tools and knowledge needed to excel in the rapidly evolving field of artificial intelligence. Designed for learners at all levels, from beginners to experts, the book makes mathematical ideas accessible through clear explanations, real- world examples, and targeted exercises. Readers will master core concepts in linear algebra, calculus, and optimization techniques; understand the mechanics of deep learning models; and apply theory to practice using frameworks like TensorFlow and PyTorch. By integrating theory with practical application, Mathematical Foundations for Deep Learning prepares you to navigate the complexities of AI confidently. Whether you’re aiming to develop prac- tical skills for AI projects, advance to emerging trends in deep learning, or lay a strong foundation for future studies, this book serves as an indispensable resource for achieving proficiency in the field. Embark on an enlightening journey that fosters critical thinking and continuous learning. Invest in your future with a solid mathematical base, reinforced by case studies and applications that bring theory to life, and gain insights into the future of deep learning.
📄 Page
3
(This page has no text content)
📄 Page
4
Mathematical Foundations for Deep Learning Mehdi Ghayoumi State University of New York
📄 Page
5
iv Designed cover image: Mehdi Ghayoumi First edition published 2026 by CRC Press 2385 NW Executive Center Drive, Suite 320, Boca Raton, FL 33431 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2026 Mehdi Ghayoumi Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyri ght.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978- 750- 8400. For works that are not available on CCC please contact mpkbookspermissions@tandf.co.uk Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. ISBN: 978- 1- 032- 69073- 5 (hbk) ISBN: 978- 1- 032- 69072- 8 (pbk) ISBN: 978- 1- 032- 69074- 2 (ebk) DOI: 10.1201/ 9781032690742 Typeset in Times by Newgen Publishing UK
📄 Page
6
v Contents Preface ..............................................................................................................................................vii Acknowledgments .............................................................................................................................ix About the Author ...............................................................................................................................xi Chapter 1 Introduction ..................................................................................................................1 Chapter 2 Linear Algebra ............................................................................................................11 Chapter 3 Multivariate Calculus .................................................................................................65 Chapter 4 Probability Theory and Statistics ...............................................................................91 Chapter 5 Optimization Theory ................................................................................................133 Chapter 6 Information Theory ..................................................................................................179 Chapter 7 Graph Theory ...........................................................................................................205 Chapter 8 Differential Geometry ..............................................................................................240 Chapter 9 Topology in Deep Learning ......................................................................................270 Chapter 10 Harmonic Analysis for CNNs...................................................................................298 Chapter 11 Dynamical Systems and Differential Equations for RNNs ......................................321 Chapter 12 Quantum Computing ................................................................................................343 Bibliography .................................................................................................................................361 Index ..............................................................................................................................................365
📄 Page
7
(This page has no text content)
📄 Page
8
vii Preface Mathematical Foundations for Deep Learning is a guide to the key mathematical principles behind modern deep learning techniques. I hope this book brings clarity to these essential concepts in artifi- cial intelligence (AI), enhancing both your theoretical understanding and practical skills. In this book, we explore important mathematical areas crucial for deep learning, such as linear algebra, calculus, probability theory, and more. Each chapter balances theory with practice, offering examples and exercises to strengthen your grasp of the material. We delve into the mathematics that power neural networks, optimization algorithms, and various deep learning architectures, aiming to connect complex theory with real- world applications. The book is organized into 12 chapters, each focusing on a specific area of mathematics as it relates to deep learning. We start with foundational concepts to ensure that all readers, regardless of their background, have the tools needed to tackle more advanced topics. Our journey begins with linear algebra and multivariate calculus, the building blocks of deep learning models. These chapters lay the groundwork for understanding how data is represented and manipulated in neural networks. We then move on to probability theory and optimization, exploring how models learn from data and how their performance can be improved. In later chapters, we introduce subjects like information theory, graph theory, and differential geometry, which play important roles in designing and operating deep learning systems. We also cover advanced topics like topology, harmonic analysis for convolutional neural networks, and dynamical systems for recurrent neural networks, showing how these areas contribute to the latest research and applications in AI. Finally, we discuss quantum computing and its potential impact on the future of deep learning. At the core of this book is a detailed look at how these mathematical foundations come together in the practical building of deep learning models. By understanding the underlying math, you’ll be better equipped to solve complex problems and innovate in the field. I wrote this book for a wide audience, from students new to deep learning to experienced professionals wanting to deepen their understanding of the math behind the models they use. My goal is to make the content accessible to everyone, with clear explanations, practical examples, and exercises to help you learn and apply what you’ve read. Thank you for joining me on this journey into the mathematical heart of deep learning. I hope this book not only expands your knowledge but also inspires you to push the boundaries of what’s possible in AI. Happy reading! Mehdi Ghayoumi Beverly Hills, CA, USA August 2024
📄 Page
9
(This page has no text content)
📄 Page
10
ix Acknowledgments The creation of this book has been deeply influenced by the extraordinary individuals in my life. While many have contributed, none have been more significant than my beloved parents, to whom I offer my deepest gratitude. First and foremost, I dedicate this book to my dear mother, Khadijeh Ghayoumi. Your unwavering belief in me, endless love, and constant support have been the foundation of my strength throughout this journey. This book is a humble tribute to the profound impact you’ve had on my life. The valuable lessons you’ve taught me continue to guide my path, and your inspiring resilience reminds me daily to persevere, no matter the challenges that arise. In loving memory of my father, Aliasghar Ghayoumi, I also dedicate this work. Though you are no longer with us, your spirit continues to light my way. Your legacy of integrity, perseverance, and wisdom remains a guiding force in my life. This book honors the values you embodied, and I hope it stands as a testament to your lasting influence. To both of you, my parents, I owe the principles that have shaped my academic journey: the relentless pursuit of knowledge, the importance of perseverance, and the virtue of humility. This book reflects those values and the love and guidance you’ve given me. I hope it makes you proud. To my extended family, I express my heartfelt thanks. Your unwavering belief in me and your constant encouragement have been crucial in achieving this goal. I also wish to sincerely thank my colleagues, mentors, and collaborators. Your wisdom, expertise, and guidance have been invaluable as I navigated the complexities of academia. The intellectual camaraderie we’ve shared has played a pivotal role in bringing this work to completion. From the bottom of my heart, thank you.
📄 Page
11
(This page has no text content)
📄 Page
12
xi About the Author Mehdi Ghayoumi is a distinguished Assistant Professor at the Center for Criminal Justice, Intelligence, and Cybersecurity at the State University of New York (SUNY) Canton. His academic career is marked by excellence and leadership, reflecting a deep commitment to both teaching and research. Previously, as a Research Assistant Professor at SUNY Binghamton, he spearheaded innovative projects at the Media Core Lab, driving forward research and development in emer- ging technologies. At Kent State University, his exceptional teaching earned him the prestigious Teaching Award for two consecutive years, 2016 and 2017. Ghayoumi has been instrumental in developing several courses in fields such as machine learning, data science, robotics, and programming. His broad and impactful research interests include Machine Learning, Machine Vision, Robotics, Human– Robot Interaction (HRI), and privacy. Focusing on the creation of practical and viable systems for real- world environments, his current multidisciplinary research spans HRI, manufacturing, biometrics, and healthcare. Deeply involved in the academic community, Ghayoumi serves on technical program committees for numerous high- profile conferences and workshops. He is also a member of the editorial boards for several respected journals in machine learning, mathematics, and robotics. His influence reaches some of the most prestigious conferences in these domains, including ICML, ICPR, NeurIPS HRI, FG, WACV, IROS, CIBCB, and JAI. Ghayoumi’s contributions are substantial and far- reaching. His research has been presented at leading conferences and published in top- tier journals, earning recognition for its depth and prac- tical relevance. He has made significant strides in advancing HRI, Robotics Science and Systems (RSS), and machine learning applications, demonstrating his ongoing commitment to pushing the boundaries of knowledge and technology. newgenprepdf
📄 Page
13
(This page has no text content)
📄 Page
14
1DOI: 10.1201/9781032690742-1 1 Introduction 1.1 INTRODUCTION Welcome to Mathematical Foundations for Deep Learning, a journey into the core of mathematics and its impact on artificial intelligence (AI). Over the past 10 years, deep learning has changed industries and how we use technology. This book aims to make these complex ideas easier to understand by focusing on the main mathematical principles behind deep learning. Whether you’re a computer scientist looking to deepen your knowledge, an artist exploring where cre- ativity meets technology or just someone who’s curious, this book will show how mathematics is the backbone of AI. You’ll learn how abstract concepts turn into real- world applications through clear examples, such as the role of linear algebra in neural networks and the use of calculus in optimization algorithms. This chapter gives an overview of the main topics we’ll cover. It highlights how these mathematical ideas help build and improve deep learning models, preparing you for the detailed exploration ahead. 1.2 IMPORTANCE OF MATHEMATICS IN DEEP LEARNING Deep learning is a field that combines many different areas, and mathematics plays a big role in it. But why is math so important for deep learning? Simply put, mathematics provides the founda- tion for everything in deep learning. It helps organize complex information, creates structure from random data, and gives meaning to the numbers we work with. If you want to design, understand, or improve deep learning algorithms, having a strong grasp of these mathematical ideas is not just useful but essential. Mathematics makes it possible to build effective models that can learn from data and make accurate predictions. 1.2.1 Structuring chaoS Deep learning is about finding meaningful patterns hidden in large amounts of data. However, as there’s so much data and it can be quite random, finding these patterns can feel like searching for a needle in a haystack. This is where mathematics comes in to turn chaos into something we can manage and understand. Mathematical principles, especially linear algebra, provide system- atic ways to process and represent data. Concepts like vectors, matrices, and tensors are funda- mental in deep learning. They allow us to represent and manipulate data efficiently. This structured approach makes it easier to use computational techniques that speed up data analysis. For example, matrix operations let us transform entire datasets into a single, coherent action, greatly simplifying the data processing phase. When deep learning algorithms extract insights or patterns from data,
📄 Page
15
2 Mathematical Foundations for Deep Learning mathematics, particularly statistics and probability, provides the essential tools to measure these discoveries. Measures like mean, median, standard deviation, and correlation give us the language to describe and assess patterns, turning abstract data into clear insights. Moreover, mathematics plays a key role in optimizing deep learning models. Based on calculus, algorithms like gradient descent adjust model parameters step by step to minimize errors. This optimization process enables deep learning models to learn from data and improve their accuracy over time. Mathematics also offers strategies to handle outliers and prevent overfitting. Techniques like regularization help balance a model’s complexity, improving its ability to generalize and leading to more reliable predictions in real- world applications. 1.2.2 impoSing ruleS on randomneSS Deep learning often involves dealing with uncertainty, like the random starting weights in neural networks. This brings up an important question: How can we use this randomness to improve our model’s predictions? Mathematics, with its clear yet flexible rules, offers a structured way to handle this uncertainty. A neural network begins by setting initial weights, which greatly affect how well it performs. These weights are usually given random values within a specific range, but this randomness isn’t without control. Mathematical principles guide the choice of this range and how the initial weights are distributed, ensuring they’re set up properly for effective learning. As the network trains, it adjusts these weights based on the errors it makes, a process where math- ematics truly shines. Math provides systematic methods for updating these weights, which is crucial for learning effectively from randomness. At the heart of learning in neural networks is backpropagation, an algorithm based on the chain rule from calculus. Backpropagation calculates the gradient of the loss function with respect to the network’s weights. This gradient shows how to update the weights to minimize errors, steadily improving the network’s performance. To put this in simple terms, imagine navigating a complex maze without a clear path. Backpropagation acts as a guide, giving you step- by- step directions on how to adjust your course at each turn. While randomness plays a key role in deep learning, the goal is for models to eventually find optimal solutions. Mathematics provides the rules and methods that make this convergence happen sys- tematically and efficiently. 1.2.3 infuSing data with meaning In its natural form, data is just a collection of unprocessed facts. Mathematics helps us turn this raw data into meaningful information by representing it as mathematical objects like vectors, matrices, or higher- dimensional tensors. This mapping allows us to measure relationships, calcu- late distances, and find patterns, which are essential steps in deep learning. A key aspect of deep learning is how we represent data. Whether we’re dealing with images, text, or sounds, real- world data is often converted into numerical arrays or tensors. This transformation uses the power of linear algebra, enabling us to manipulate and analyze data efficiently. By adding structure to what might be chaotic information, mathematics makes it easier to manage and understand com- plex data. Mathematics also helps us quantify relationships between different data points. This could involve calculating correlations, finding dependencies, or uncovering hidden patterns using techniques like dimensionality reduction. Additionally, mathematical tools allow us to measure distances or differences between data points. Methods like Euclidean distance or cosine similarity tell us how similar or dissimilar items are. This ability is vital in many deep learning applications, such as grouping similar items (clustering), detecting unusual data points (anomaly detection), and finding the closest matches in data (nearest- neighbor searches). By mapping data onto math- ematical structures, we can identify patterns. For example, Fourier analysis can detect repeating
📄 Page
16
3Introduction patterns in time- based data, while convolution operations in convolutional neural networks (CNNs) find spatial patterns in images. Recognizing these patterns is fundamental to learning from data and making accurate predictions. To bring this idea to life, imagine grouping data points based on their similarities. Without mathematical techniques to measure these similarities, effect- ively clustering data would be nearly impossible. Figure 1.1 illustrates how various mathematical constructs transform raw data into meaningful insights. The first subplot demonstrates the use of principal component analysis (PCA) to reduce the dimensionality of data, making it more interpretable and easier to visualize. In the second subplot, a heatmap reveals the Euclidean distances between data points, quantifying their dis- similarities and helping us understand the spatial relationships within the dataset. The third sub- plot displays a heatmap of cosine similarities, highlighting the degree of similarity between data points by focusing on their angular relationships. Finally, the K- means clustering subplot shows how data points are grouped into distinct clusters, revealing underlying patterns and structures in the dataset. 1.2.4 deSigning and interpreting algorithmS Mathematical principles act like a compass, helping us choose the right functions, understand the effects of our choices, and keep improving existing algorithms. For example, consider how math is crucial when selecting activation functions for neural networks. Why are sigmoid functions often used? The answer lies in their unique mathematical properties. With their S- shaped curve, sigmoid functions accept any real number as input but always output a value between 0 and 1. This makes them perfect for situations where predictions, like probabilities, need to stay within a specific range. Understanding the math behind an algorithm also sheds light on how it behaves. For instance, knowing that sigmoid functions squeeze extreme input values into a narrow range between 0 and 1 helps us understand why neural networks using them might face vanishing gradients. This is a phenomenon where the gradients become very small, slowing down learning and affecting training efficiency. Mathematics is not just foundational for enhancing and creating new algorithms but also for understanding current ones. 1.2.5 improving modelS Mathematics is essential for designing and understanding deep learning models, and it plays a crucial role in improving their performance. By learning mathematical concepts like overfitting, underfitting, and regularization, we can identify issues in our models, optimize their performance, and enhance their reliability. Overfitting and underfitting are common challenges in machine learning. Understanding these problems is key to fixing weaknesses in our models. Mathematics provides us with tools to tackle these issues directly. For example, regularization is a mathemat- ical technique that improves a model’s ability to generalize by reducing its complexity, helping to prevent overfitting. Techniques like L 1 and L 2 regularization, each with their own benefits, are crucial for simplifying models and making them more effective. Knowing these techniques empowers us to choose the best one for our specific needs, further enhancing our model’s per- formance. Mathematics also plays a vital role in optimizing model parameters. Methods like gra- dient descent adjust model parameters step by step to minimize errors efficiently. Additionally, understanding the bias- variance trade- off, a key concept in statistics, helps us balance a model’s accuracy on training data (bias) with its ability to perform well on new data (variance). Finding the right balance between bias and variance is crucial for building accurate and robust models. By applying mathematical insights, we can fine- tune our models to ensure they perform well both in theory and in real- world situations.
📄 Page
17
4 M ath em atical Fo u n d atio n s fo r D eep Learn in gFIGURE 1.1 Transforming data and identifying patterns. new genrtpdf
📄 Page
18
5Introduction 1.3 BRIEF OVERVIEW OF DEEP LEARNING 1.3.1 Simulating human- like learning Deep learning takes inspiration from how humans learn, especially through experience. Just like a child recognizes shapes, sounds, or faces by seeing them repeatedly, deep learning models process vast amounts of data through layers of artificial neural networks (ANNs). These models gradually learn to detect patterns, make connections, and develop a deeper understanding of the input data. At the core of deep learning is the idea of learning from data. Unlike traditional programming, where machines follow specific instructions for every task, deep learning models learn by example. They start with raw, unprocessed data and improve their understanding as they process it. This enables them to make decisions or predictions without being explicitly programmed for each specific task. Much like humans, deep learning models improve with experience. They update their parameters with each training cycle to reduce prediction errors. The more data they are exposed to, the better they perform their tasks. This ongoing process mirrors human learning, where continuous practice leads to gradual improvement and mastery. By imitating the way humans learn, deep learning has become invaluable in many areas, from image recognition and natural language processing (NLP) to self- driving cars and medical diagnoses. Deep learning models can tackle complex problems that were once thought too difficult for artificial intelligence (AI). 1.3.2 artificial neural networkS ANNs are the foundation of deep learning and are inspired by the neural networks in the human brain. These computational models imitate how the brain processes information but in a much sim- pler way. ANNs consist of interconnected layers of nodes, or “neurons”, that work together to pro- cess information and learn patterns from data. An ANN is typically organized into three main layers: • Input Layer: This layer receives the raw data that the network will process. Each neuron here represents a feature or input variable from your dataset. • Hidden Layers: These are the intermediate layers that perform complex transformations on the input data. An ANN can have one or more hidden layers, each containing many neurons. These neurons are connected to neurons in the previous and next layers through weighted connections, similar to synapses in the brain. When a network has multiple hidden layers, it’s called a deep neural network. • Output Layer: This final layer produces the network’s output, the result of processing the data. The number of neurons in this layer depends on how many output variables you need. Each neuron in an ANN performs a simple operation: 1. Receive Inputs: The neuron gets inputs from the neurons in the previous layer. Each input is multiplied by a specific weight. 2. Calculate Weighted Sum: The neuron sums up all these weighted inputs and adds a bias term. 3. Apply Activation Function: The result is passed through an activation function, like the sig- moid function or ReLU (rectified linear unit). This function introduces nonlinearity into the model, allowing the network to learn complex patterns that aren’t just straight lines. The real strength of ANNs comes from the way neurons and layers are interconnected. During training, the network adjusts the weights and biases of each neuron to minimize the difference between its predictions and the actual values. This adjustment is done through a process called backpropagation, which uses optimization algorithms like gradient descent. Backpropagation calculates the gradient of the loss function (which measures the error) with respect to each weight.
📄 Page
19
6 M ath em atical Fo u n d atio n s fo r D eep Learn in g FIGURE 1.2 (a) Structure of an ANN, (b) neuron operation, and (c) backpropagation and optimization. new genrtpdf
📄 Page
20
7Introduction It then updates the weights to reduce this loss. ANNs are powerful because of their layered and interconnected structure, and they can model complex patterns and structures in data. They excel at finding hidden patterns and subtle relationships that traditional machine learning models might miss. This ability has led to major breakthroughs in areas such as computer vision, NLP, and speech recognition. Figure 1.2 illustrates the key components of ANNs. Figure 1.2a depicts the basic structure of an ANN, showcasing the input layer, one or more hidden layers, and the output layer. Figure 1.2b demonstrates the operation of a single neuron, showing how it processes inputs, applies weights and biases, and utilizes an activation function to produce an output. Figure 1.2c visualizes the backpropagation process, highlighting how the loss decreases over epochs as the model’s parameters are optimized through iterative weight adjustments. 1.3.3 applicationS of deep learning Deep learning, a major part of AI, has transformed many fields in technology, research, and business through its wide- ranging applications. Its ability to process and learn from large amounts of data has led to breakthroughs once thought impossible. In computer vision, deep learning excels at image recognition, enabling models to quickly and accurately identify objects, people, and scenes in images. This power is behind facial recognition systems, self- driving cars, and medical imaging. For example, deep learning algorithms can detect tumors in medical scans with accuracy similar to experienced doctors. In NLP, deep learning has made significant strides. Algorithms can now understand and generate human language, allowing for sentiment analysis, language translation, text summarization, and question- answering. These advancements power everyday tools such as voice assistants, customer service chatbots, and real- time translation services, improving communication and accessibility worldwide. Beyond recognition and understanding, deep learning fosters creativity through generative models like generative adversarial networks (GANs). GANs can produce new data that closely resembles their training data, creating realistic images, music, and even art. For instance, GANs can generate lifelike human faces of people who do not exist, which has exciting applications in entertainment, fashion, and virtual reality. In predictive analytics, deep learning is a powerful tool that can analyze vast amounts of data to forecast future events like stock prices, customer behavior, disease outbreaks, and natural disasters. Industries such as finance, marketing, healthcare, and disaster management use this predictive power to make informed, data- driven decisions with greater confidence. Additionally, reinforcement learning, a subset of deep learning, involves models that learn to make decisions by interacting with their environment. This approach has achieved remarkable success in game- playing AI, surpassing human champions in games like Go, Chess, and Poker. Reinforcement learning is also applied in robotics, helping robots learn to navigate environments and manipulate objects on their own, paving the way for more advanced automation and intelligent systems. These applications demonstrate the significant impact deep learning has across various fields, driving innovation and enhancing capabilities in many aspects of modern life. As deep learning continues to evolve, it holds the promise of unlocking new possibil- ities and addressing complex challenges that were once beyond our reach. 1.4 BOOK FEATURES AND STRUCTURE 1.4.1 Book featureS Math can often feel overwhelming, filled with strange symbols and complex rules, especially when you’re new to deep learning. Concepts like vectors and matrices from linear algebra, the tricky operations of calculus, and the rules of probability are all deeply connected to deep learning. At first glance, these ideas might seem scary, like a wall that’s hard to climb. That’s where this book comes in. It aims to guide you through the maze of math by breaking down these abstract concepts and