Deep Learning with Rust Mastering Efficient and Safe Neural Networks in the Rust Ecosystem (Mehrdad Maleki)（Z-Library）

Deep Learning with Rust Mastering Ef f icient and Safe Neural Networks in the Rust Ecosystem — Mehrdad Maleki

Deep Learning with Rust

Mehrdad Maleki Deep Learning with Rust Mastering Efficient and Safe Neural Networks in the Rust Ecosystem

Mehrdad Maleki Naas, Kildare, Ireland ISBN 979-8-8688-2207-0 ISBN 979-8-8688-2208-7 (eBook) https://doi.org/10.1007/979-8-8688-2208-7 © Mehrdad Maleki 2026 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Apress imprint is published by the registered company APress Media, LLC, part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A. If disposing of this product, please recycle the paper.

To my dear wife, Somi, for her patience and constant support through every step of this journey. To my dear older son, Sami, whose encouragement gave me the strength to finish this book. To my dear little son, Soren, for all the moments we missed playing together while I was working—this book is for you, too. And to my dear parents, for the love and guidance that shaped who I am. This book is also yours.

Declarations Competing Interests The author has no competing interests to declare that are relevant to the content of this manuscript. vii

Contents Part I Foundations of Deep Learning in Rust 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Introduction to Deep Learning and Rust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Detailed Comparison of Programming Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 How to Use This Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Companion GitHub Repository for Source Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Introduction to Deep Learning in Rust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Overview of Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 Foundational Concepts in Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 Applications of Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.3 Why Language Choice Matters in Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 The Rust Advantage in AI Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Setting Up Your Rust Environment for AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4.1 Installing Rust. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4.2 Tips for Using rustup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.3 Cargo: Rust’s Package Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.4 Installing Essential Libraries (Crates) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.5 Installing and Testing Linfa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4.6 Optimizing Rust for AI Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3 Rust Syntax for AI Practitioners (Optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Rust Syntax and Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.1 Basic Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.2 Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.3 Functions and Return Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 Structs and Enums for Data Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.1 Structs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.2 Implementing Methods for Structs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4 Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.4.1 The Result Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.4.2 The Option Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 ix

x Contents 3.4.3 Error Propagation and the ? Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4.4 Best Practices for Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.5 Memory Safety in AI Workflows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.5.1 Borrowing and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.5.2 Memory Allocation and Deallocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.6 The Ownership Model for Data Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.6.1 The Ownership Concept in Rust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.6.2 Clone and Copy Traits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.6.3 Using Ownership in AI Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4 Why Rust for Deep Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2 Why Rust? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3 Lifetime and Scope in Rust and Their Importance in Deep Learning . . . . . . . . . . . . . . . . . . . . 59 4.4 Performance Advantages of Rust in Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4.1 Why Rust Is Faster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4.2 Example: CSV Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.5 Concurrency and Parallelism in Rust for AI Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.5.1 Performance Comparison: Rust vs. Python for Parallel Computation . . . . . . . . . . . 62 4.5.2 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5.3 Rust Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5.4 How Parallelism Works in Rust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.5.5 Python Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.5.6 CPU Parallelism in the Age of GPU Compute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.6 Tooling and Ecosystem in Rust for Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.6.1 Emerging Libraries in Rust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Part II Advancing with Rust in AI 5 Building Blocks of Neural Networks in Rust. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2 Basic Neural Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2.1 Implementing Perceptron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2.2 Implementing XOR with Perceptrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.2.3 Forward Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2.4 Feedforward Pass for a Three-Layer Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2.5 Automatic Differentiation with autodiff Crate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.6 Backpropagation Using Automatic Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.3 Plotting Graphs in Deep Learning with plotters Crate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.3.1 Plotting Simulated Training Loss in Rust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.3.2 Scatter Plot with plotters Crate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6 Rust Concurrency in AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.2 Concurrency vs. Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.3 Threads and Spawn in Rust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.4 Concurrency in Deep Learning Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.4.1 Concurrent Data Loading and Preprocessing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.4.2 Parallelizing Computation Across Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Contents xi 6.4.3 Model Evaluation During Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.4.4 Logging and Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7 Deep Neural Networks and Advanced Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Chapter Goal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.2 Designing and Implementing DNNs in Rust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.3 Convolutional Neural Networks (CNNs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.3.1 CNN Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.3.2 Implementing a Basic CNN in Rust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.4 Building a CNN From Scratch in Rust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.4.1 Step 1: Activation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.4.2 Step 2: Loss Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 7.4.3 Step 3: Convolution Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 7.4.4 Step 4: Convolution Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.4.5 Step 5: Max Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.4.6 Step 6: Max Pooling Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.4.7 Step 7: Training the CNN Step by Step. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 7.4.8 Using the Trained CNN for Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.5 Recurrent Neural Networks (RNN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.5.1 RNNs as Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.5.2 Fixed-Size Input/Output RNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7.5.3 Variable-Size Input/Output: Encoder-Decoder (Seq2Seq) . . . . . . . . . . . . . . . . . . . . . . . . 123 7.5.4 Training RNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.6 A Minimal RNN in Rust with tch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.6.1 Context and Problem Statement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.6.2 Reading the Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.6.3 How Each Line Mirrors the Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.7 Long Short-Term Memory (LSTM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.7.1 Why RNNs Struggle with Long-Term Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.7.2 The LSTM Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.7.3 Intuition Behind the Gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.7.4 Mathematical Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.7.5 Training LSTMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.7.6 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.8 Implementing LSTM in Rust over the One-Shift Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.8.1 What Stays the Same . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.8.2 What Changes (and Why) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.8.3 The Minimal Changes, Shown Side by Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.8.4 Reading the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.8.5 Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 8 Generative Models and Transformers in Rust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Chapter Goal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.2 Generative Adversarial Network (GAN). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.2.1 Min-Max Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 8.2.2 Expectation for Real Data x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 8.2.3 Expectation for Fake Data G(z) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 8.2.4 Objective Function Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

xii Contents 8.2.5 The Min-Max Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 8.2.6 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 8.3 A Minimal GAN in Rust with tch: Explanation and Walk-Through . . . . . . . . . . . . . . . . . . . . 140 8.3.1 High-Level Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 8.3.2 Full Code (for Reference) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 8.3.3 Explaining Each Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 8.3.4 Notes and Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 8.3.5 What Success Looks Like . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 8.3.6 Result Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 8.4 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8.4.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 8.4.2 Self-Attention Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 8.4.3 Positional Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 8.4.4 Multi-Head Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 8.4.5 Feed-Forward Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 8.5 Transformers (A Meaningful Toy Task) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 8.5.1 Task Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 8.5.2 Results and Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 8.5.3 Code Walk-Through (Piece by Piece) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 8.6 A Minimal Transformer for NLP in Rust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.6.1 What This Code Is Supposed to Do. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.6.2 How It Works (High Level) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.6.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.6.4 Complete Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 8.6.5 Code, Piece by Piece . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

About the Author Dr. Mehrdad Maleki holds a Ph.D. in Theoretical Computer Sci- ence and a master’s degree in Mathematics. He is an accomplished AI scientist and researcher specializing in artificial intelligence, quantum computing, and cybersecurity. His work combines deep mathematical insight with practical engineering to design scalable, high-performance AI and quantum systems. Over the years, Dr. Maleki has led several R&D projects, contributing to more than ten patents in AI and quantum computing. His research and innovations span areas such as deep learning, foundation mod- els, automatic differentiation, and scientific computing. Proficient in Python and Rust, he bridges the gap between theoretical research and real-world applications by transforming complex algorithms into impactful solutions. xiii

About the Technical Reviewer Andrew Rzeznik is a Senior Systems Engineer at Cloudflare, working on problems in networking. He has previously done programming work in cryptography, data processing, and factory automation. He holds a Ph.D. in Mathematics from MIT, where his research focused on atmospheric waves and deep-sea mining plumes. In his spare time he enjoys woodworking and being with his family. xv

Introduction Artificial Intelligence (AI) and deep learning are among the most transformative technologies of our time. They are reshaping how we live, work, and interact with the world—driving innovations in finance, healthcare, manufacturing, and beyond. However, as deep learning models grow in scale and complexity, so do the challenges of implementing them efficiently, securely, and reliably. This book, Deep Learning with Rust, is written to bridge the gap between theoretical under- standing and high-performance implementation. It combines the mathematical and conceptual foundations of deep learning with the engineering precision of Rust—a modern programming language designed for safety, concurrency, and performance. By the end of this book, readers will not only understand how deep learning works but also how to build, optimize, and scale deep learning systems in Rust from the ground up. Who This Book Is For This book is intended for readers who are curious about how AI systems work under the hood and who wish to go beyond using existing libraries. It assumes a basic understanding of programming and mathematics (functions, calculus, and probability) but does not require prior knowledge of Rust or deep learning. It is particularly useful for: • AI practitioners who want to explore Rust as a new, safer, and faster alternative to Python for implementing deep learning models. • Developers and engineers interested in building efficient and reliable AI systems for production environments. • Researchers and students looking to strengthen their understanding of deep learning fundamen- tals while learning how to implement them in a high-performance language. Each chapter is self-contained yet builds progressively toward a comprehensive mastery of the subject. What This Book Covers The book is divided into two main parts, designed to take readers from basic concepts to advanced implementations. xvii

xviii Introduction • Part I—Foundations of Deep Learning in Rust introduces the principles of AI and deep learning, explores why language choice matters for performance and scalability, and shows how to set up a complete Rust environment for AI development. It covers essential Rust syntax, data structures, error handling, ownership, and memory management—all framed from an AI practitioner’s perspective. • Part II—Advancing with Rust in AI moves from concepts to practice. It explains how to implement neural networks from scratch, build and train perceptrons, and extend these to more advanced architectures such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, Generative Adversarial Networks (GANs), and Transformers. Each chapter walks through mathematical derivations, Rust code examples, and visual outputs, helping readers understand both how models work and how to implement them safely and efficiently. Throughout the book, you will also learn about automatic differentiation, concurrency and parallelism, and optimization workflows, all within the Rust ecosystem. How to Use This Book This book combines theory, code, and exercises. Each chapter begins with a clear conceptual explanation, followed by annotated examples in Rust and practice problems at the end. To get the most out of it: • Work through the examples interactively. • Modify the code, experiment with parameters, and observe how changes affect the output. • Review the Problems sections to reinforce learning. • Access the companion GitHub repository, which includes all source code, exercises, and updates aligned with the latest version of Rust and its machine learning crates. This hands-on approach ensures that readers gain not only theoretical knowledge but also practical skills that can be applied directly in AI projects. Why Rust for Deep Learning? While most deep learning frameworks today are written in Python, performance-critical components are implemented in C++ or CUDA. Rust offers a unified alternative: the speed of C, the safety of memory ownership, and the expressiveness of a modern language. Rust’s features—such as strict compile-time checks, zero-cost abstractions, and safe concurrency— make it ideal for building scalable AI systems that avoid the pitfalls of memory leaks, segmentation faults, and data races. In addition, the Rust ecosystem has rapidly matured with crates like ndarray, linfa, and tch, which enable high-level AI development while maintaining low-level control. Rust empowers developers to write fast, safe, and energy-efficient AI code, making it a compelling choice for the next generation of AI research and production.

Introduction xix Final Thoughts The journey through this book is both technical and conceptual. It starts with understanding what deep learning is and ends with implementing advanced models capable of solving real-world problems. Along the way, you will gain insight into how AI systems think, learn, and adapt—and how to translate those processes into efficient, safe, and modern code. By mastering deep learning with Rust, you are not just learning another programming language— you are stepping into a new paradigm of reliable, high-performance AI development.

Part I Foundations of Deep Learning in Rust

Chapter 1 Introduction 1.1 Introduction In this chapter, we introduce Rust and explain why it is our choice for deep learning implementation due to its memory safety, concurrency, and performance. We discuss the importance of studying deep learning, highlighting its wide applications, such as image recognition and natural language processing. We provide a road map for studying the book, emphasizing the sequential structure of chapters and the importance of practicing exercises. Supplementary materials, like the GitHub repository, offer access to source code examples and solutions, ensuring the code stays updated with the latest Rust versions. Engaging with these resources will enhance your proficiency in implementing deep learning models in Rust. 1.2 Introduction to Deep Learning and Rust Artificial intelligence (AI) is transforming industries and everyday life, with deep learning driving much of this change. From applications that predict trends to systems that understand language, AI’s rapid advancement has opened up incredible possibilities. But with these advancements comes a critical need for AI systems that are not only high performing but also secure and reliable. In recent years, breakthroughs in AI have been closely tied to improvements in machine learning techniques, particularly those that can process vast amounts of data. However, creating AI models that are both efficient and safe is a complex challenge. Ensuring that these systems can be implemented and scaled securely is essential, especially as AI technology becomes more integrated into sensitive areas of society and business. This book introduces the fundamental concepts behind deep learning and explores various types of architectures and their applications. A key aspect we’ll cover is implementing these AI models using Rust—a programming language that stands out for its speed, security, and efficiency. Rust is gaining traction in the AI community because it combines principles from several established programming languages, and its design makes it easier for developers to write code that is both performant and memory-safe. You might wonder why Rust is suited for AI. Known for its robust performance and safe memory management, Rust enables developers to build systems that are both powerful and highly stable. We’ll discuss how Rust’s unique features, such as concurrency and efficient memory handling, make it an excellent choice for AI applications. Later in this chapter, we’ll look at why Rust can outperform © Mehrdad Maleki 2026 M. Maleki, Deep Learning with Rust, https://doi.org/10.1007/979-8-8688-2208-7_1 3

4 1 Introduction other languages in critical ways and how it offers a green, energy-efficient alternative for intensive computing tasks. The following chapters will dive deeper into the technical details and introduce key concepts, giving you a comprehensive foundation in both AI and Rust’s potential for AI development. For now, consider this a starting point—a view of what’s possible as we combine the strengths of advanced AI with the stability and efficiency Rust brings to the table. 1.3 Detailed Comparison of Programming Languages To understand why Rust is an excellent choice for AI and Deep Learning, it’s important to compare it in detail with other programming languages. Table 1.1 presents a comparison based on three metrics: energy efficiency, time complexity, and memory usage. 1. Energy Efficiency • C (1.00): As a low-level language, C is extremely energy efficient, offering fine-grained control over hardware resources. • Rust (1.03): Rust’s energy efficiency is nearly on par with C, thanks to its ability to perform low-level system operations—such as direct memory management and fine-grained control over hardware resources—while maintaining strong compile-time safety and optimization features. • C++ (1.34): While also close to C in terms of efficiency, C++’s additional features can sometimes introduce overhead. • Java (1.95): Java’s energy consumption is higher due to its virtual machine and garbage collection, which add runtime overhead. • Python (75.88): Python, an interpreted language with dynamic typing, is significantly less energy efficient. Its ease of use comes at the cost of higher energy consumption. Table 1.1 Compare energy efficiency and time complexity of different programming languages [8] PL Energy Time Mb C 1.00 1.00 1.17 Rust 1.03 1.04 1.54 C++ 1.34 1.56 1.34 Ada 1.70 1.85 1.47 Java 1.95 1.89 6.01 Pascal 2.14 3.02 1.00 Lisp 2.27 3.40 1.92 Ocaml 2.40 3.09 2.82 Fortran 2.52 4.20 1.24 Haskell 3.10 3.55 2.54 C# 3.14 3.14 2.58 Go 3.23 2.83 1.05 F# 4.13 6.30 4.25 JavaScript 4.45 6.52 4.59 Racket 7.91 11.27 3.52 TypeScript 21.50 46.20 4.69 PHP 29.30 27.64 2.57 Ruby 69.91 59.34 3.97 Python 75.88 71.90 2.80 Perl 79.58 65.79 6.62

1.4 How to Use This Book 5 2. Time Complexity • C (1.00): Known for its speed, C is often used in performance-critical applications. • Rust (1.04): Rust’s performance is comparable to C, making it suitable for high-performance computing. • C++ (1.56): Offers object-oriented features that can add complexity and slow down execution. • Java (1.89): Bytecode interpretation and garbage collection can increase execution time. • Python (71.90): Python’s interpreted nature and dynamic typing lead to slower execution times compared to compiled languages. 3. Memory Usage • C (1.17 MB): C is known for its efficient memory usage. • Rust (1.54 MB): Rust’s memory footprint is slightly higher than C’s due to its safety model,1 yet it remains highly efficient. • C++ (1.34 MB): Similar to C, but object-oriented features can increase memory usage. • Java (6.01 MB): Higher memory usage due to the JVM and garbage collection. • Python (2.80 MB): Higher than C and Rust due to its runtime and dynamic typing. This detailed comparison underscores why Rust is a strong candidate for implementing AI systems. Its balance of energy efficiency, time complexity, and memory usage makes it an optimal choice for high-performance applications where efficiency and speed are crucial. To illustrate the power of Rust in the context of AI, consider an example where we implement a simple neural network for image classification. In Python, this might involve using libraries such as TensorFlow or PyTorch. While these frameworks are highly optimized at their core—relying on C, C++, and CUDA for the heavy numerical computation—the surrounding Python “glue code” that coordinates these operations can introduce overhead, especially in fine-grained or iterative workloads. Rust, on the other hand, allows developers to build neural networks directly with system-level control, combining high performance with memory safety. Moreover, Rust’s ecosystem includes crates (Rust’s term for libraries) like ndarray for numerical computing, autograd for automatic differentiation, linfa for machine learning framework in Rust, providing algorithms and utilities for common machine learning tasks, and tch-rs, which provides bindings for PyTorch, allowing Rust to seamlessly integrate with existing AI tools and frameworks. 1.4 How to Use This Book This book is not an elementary text on programming, AI, or deep learning. Some prior programming experience is required to follow the material smoothly. While a basic understanding of neural networks is helpful, it is not mandatory. This book is aimed at an intermediate level, so a secondary-school knowledge of mathematics, especially functions, probability, and differential calculus, is crucial for understanding the deep learning concepts introduced here. Consider this book as a research project. Read each chapter while testing the code and configu- rations in your preferred operating system. We will provide instructions for installing Rust and its dependencies across various platforms, including Windows, macOS, and Linux, to ensure a smooth start regardless of your setup. 1 While Rust’s ownership and borrowing system eliminates many memory errors, its safe abstractions and initialization guarantees can result in marginally higher memory use compared to C’s fully manual memory management.

6 1 Introduction To get the most out of this book, practice the exercises at the end of each chapter before moving on to the next. Some exercises test the coding skills learned in the chapter, while others assess theoretical understanding. These exercises are designed to reinforce concepts and offer hands-on experience with Rust and deep learning. We also recommend exploring supplementary resources and engaging with the Rust and AI communities. Online forums, GitHub repositories, and Rust’s official documentation can provide valuable insights and support as you progress. Active participation in these communities will help you stay updated on the latest developments and best practices in both Rust and deep learning. The chapters in this book are structured sequentially, with each building on the knowledge introduced in previous ones. While it’s designed to be read in order, the structure allows some flexibility, enabling you to skip ahead or revisit certain chapters as needed. This approach ensures that you can navigate through the material in a way that best suits your learning style, while still gaining a cohesive understanding of Rust and deep learning concepts. 1.5 Companion GitHub Repository for Source Code There is a GitHub repository hosted by Apress that includes the latest version of the code from the book. The repository is updated every six months, with announcements made on the book’s website on Apress. The address of the repository is https://github.com/Apress/Deep-Learning-with-Rust?tab= readme-ov-file. The repository includes the source code for examples explained in the book. By accessing the repository, you can download and run the code on your local machine, enabling you to experiment with the examples and modify them to suit your needs. Furthermore, the repository provides a platform for collaboration. You can contribute to the repository by suggesting improvements, reporting issues, or even adding new examples and exercises. This collaborative approach enhances the learning experience and fosters a sense of community among readers. The advantage of a GitHub repository for the book is that the source code will be updated according to the latest version of Rust or its libraries. Also, new developments in functions will be tracked, so you can adjust your code based on recent advancements. This ensures that the book remains relevant and up-to-date, even as the Rust language and its ecosystem continue to evolve. To summarize, this book aims to equip you with the knowledge and skills necessary to implement Deep Learning algorithms in Rust. By understanding the core principles of Deep Learning and leveraging Rust’s unique features, you can build efficient, secure, and high-performance AI systems. As you progress through the chapters, you will gain a deeper appreciation for the interplay between theory and practice, ultimately becoming proficient in both Deep Learning and Rust. Problems 1.1 List and explain at least three features of Rust that make it suitable for deep learning applications. 1.2 Compare Rust’s memory management with that of Java, focusing on the absence of garbage collection in Rust. 1.3 Illustrate how Rust’s concurrency model can prevent data race in AI applications.

Statistics

Uploader

Deep Learning with Rust Mastering Efficient and Safe Neural Networks in the Rust Ecosystem (Mehrdad Maleki)（Z-Library）

AI Reading Assistant

Passage locations

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Recommended for You

Statistics

Uploader

Deep Learning with Rust Mastering Efficient and Safe Neural Networks in the Rust Ecosystem (Mehrdad Maleki)（Z-Library）

AI Reading Assistant

Passage locations

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Reply to Comment

Edit Comment

Recommended for You