Hands-On Generative AI with Transformers and Diffusion Models (Omar Sanseviero, Pedro Cuenca etc.) (Z-Library)

(This page has no text content)

Praise for Hands-On Generative AI with Transformers and Diffusion Models An essential technical guide that delivers clear, hands-on instructions for implementing stable diffusion and fine- tuning language models. A must-have for any AI developer’s bookshelf. —Vicki Reyzelman, chief AI solutions architect, Mave Sparks As a comprehensive and practical guide for anyone eager to master generative AI, this book blends theory with real-world applications. From the fundamentals of language models and diffusion techniques to advanced topics like fine-tuning and creating text-to-image applications, the authors provide actionable Python code and clear insights that empower readers to build, innovate, and stay ahead in this rapidly evolving field. Their expertise and hands-on approach make this book an invaluable resource for both beginners and experienced practitioners alike. —Anil Sood, senior manager, Ernst & Young US An invaluable guide that demystifies generative AI, blending practical insights with hands-on techniques and examples covering various domains. A must-read for those interested in the future in AI. —Vishwesh Ravi Shrimali, an engineer in the automobile industry

The book is an incredibly well-crafted guide that makes complex AI concepts accessible to a wide spectrum of readers. The authors bring clarity to transformers and diffusion models, making this a fantastic read for anyone looking to truly understand the fundamental building blocks driving today’s generative AI. —Sai M Vuppalapati, data and AI/ML platforms product manager, Tubi TV This book is a treasure trove for anyone curious about the potential of AI-generated content. With a focus on solving relevant real life problems and hands-on guidance, it skillfully bridges complex concepts and makes generative AI approachable for enthusiasts and professionals alike. A must-read for anyone ready to dive into this dynamic field and explore the power of generative AI. —Lipi Deepaakshi Patnaik, senior software developer, Zeta This book is exactly what you need to get started with generative AI: from comprehensive explanations to thoughtful tips and do-it-yourself exercises, it has it all. An excellent guide for anyone looking to learn how to use, adapt and evaluate generative AI models. —Luba Elliott, AI art curator, elluba.com

Omar, Pedro, Apolinário, and Jonathan present an impressive blend of technical depth and intuitive guidance, empowering readers to bring innovative generative AI solutions to life with clarity and purpose. Through clear explanations of transformers and diffusion models, their in-depth development and applications across text, images, and audio, they make the complex world of AI both accessible and actionable. This work equips the next generation of innovators to confidently navigate GenAI’s technical, ethical, and practical challenges. —Aditya Goel, AI consultant This book is excellent for anyone starting their journey with generative AI. The authors guide us through this complex topic in a simple and intuitive way. —Zygmunt Lenyk, research engineer, Odyssey Hands-On Generative AI with Transformers and Diffusion Models offers a comprehensive, accessible guide to the core concepts and applications of generative AI. The authors skilfully cover essential topics, from transformers and diffusion models to creative applications, making it a must-read for those looking to master GenAI technologies. —Gourav Singh Bais, senior data scientist and senior technical content writer, Allianz Services

The essential guide for developers to master the tools and concepts behind the biggest AI revolution in the last decade. This is such a serious competitor to my own book that I fear for our royalties! —Lewis Tunstall, machine learning engineer at Hugging Face and coauthor of Natural Language Processing with Transformers

Hands-On Generative AI with Transformers and Diffusion Models Omar Sanseviero, Pedro Cuenca, Apolinário Passos, and Jonathan Whitaker

Hands-On Generative AI with Transformers and Diffusion Models by Omar Sanseviero, Pedro Cuenca, Apolinário Passos, and Jonathan Whitaker Copyright © 2025 Omar Sanseviero, Pedro Cuenca, Apolinário Passos, and Jonathan Whitaker. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (https://oreilly.com). For more information, contact our corporate/institutional sales department: 800- 998-9938 or corporate@oreilly.com. Acquisitions Editor: Nicole Butterfield Development Editor: Jill Leonard Production Editor: Gregory Hyman Copyeditor: Krsta Technology Solutions Proofreader: Sharon Wilkey Indexer: BIM Creatives, LLC Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea

December 2024: First Edition Revision History for the First Edition 2024-11-22: First Release See http://oreilly.com/catalog/errata.csp? isbn=9781098149246 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Hands-On Generative AI with Transformers and Diffusion Models, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-098-14924-6 [LSI]

Preface Generative AI is a revolutionary technology that has rapidly transitioned from lab demos to real-world applications, impacting billions. It can create new content—images, text, audio, videos, and more—by learning patterns from existing data, thereby enhancing creativity, augmenting data, or assisting in many tasks. For instance, a generative AI model trained on music can compose new melodies, while one trained on text can generate stories or even programming code. This book isn’t just for experts—it’s for anyone who wants to learn about this fascinating new field. We won’t focus on building models from scratch or diving straight into complicated mathematics. Instead, we’ll leverage existing models to solve real-world problems, helping you to build a solid intuition around how these techniques work and providing the foundation for you to keep exploring. This hands-on approach, we hope, will help you get up and running quickly and efficiently with generative AI. You’ll learn how to use pretrained models, adapt them for your needs, and generate new data with them. You’ll also learn how to evaluate the quality of generated data and explore ethical and social issues that may arise from using generative AI. This exposure will allow you to stay up-to- date with new models and help you identify areas that you may want to explore more deeply.

Who Should Read This Book Given the impressive products and news you might have seen about generative AI, it’s normal to be excited, or worried, about it! Whether you’re curious about how programs can generate images, want to train a model to tweet in your style, or are looking to gain a deeper understanding of products like ChatGPT, this book is for you. With generative AI, we can do all of that and many other things, including these: Write summaries of news articles Generate images based on a description Enhance the quality of an image Transcribe meetings Generate synthetic speech in your voice style Incorporate new subjects or styles into image- generation models, like creating images of “your cat dressed as an astronaut” No matter your reason, you’ve decided to learn about generative AI, and this book will guide you through it. Prerequisites This book assumes that you are comfortable programming in Python and have a foundational understanding of what machine learning is, including basic usage of frameworks like PyTorch or TensorFlow. Having practical experience with training models is not required, but it will be helpful to understand the content with more depth. The following

resources provide a good foundation for the topics covered in this book: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd ed., by Aurélien Géron (O’Reilly) Deep Learning for Coders with fastai and PyTorch by Jeremy Howard and Sylvain Gugger (O’Reilly) If you feel intimidated by the prerequisites, don’t worry! The book is designed to enhance your intuition and provide a hands-on approach to help you get started. What You Will Learn This book is divided into three parts: In Part I, “Leveraging Open Models”, we’ll introduce the fundamental building blocks of generative AI. You’ll learn how to use pretrained models to generate text and images. This part will help you understand the basics of the field and understand the big picture. Part II, “Transfer Learning for Generative Models”, is all about fine-tuning, showcasing ways to take existing models and adapt them to your needs. We’ll walk you through how to teach a diffusion model a new concept, customize a transformer model to classify text and reply in conversations, and explore advanced techniques for working with large models on limited hardware. Don’t worry if this is the first time you read about transformer or diffusion models; you’ll learn about them soon.

In Part III, “Going Further”, we’ll extend the ideas from the previous parts, generating new modalities such as audio and getting creative with new applications. After you’ve read this book, you’ll have a solid understanding of the methods and techniques on which generative applications are built. How to Read This Book We designed the book to be read in order, but we have kept the chapters as self-contained as possible so that you can jump around to the parts that interest you most. Many of the ideas covered in this book apply to multiple modalities, so even if you are interested in only one particular domain (such as image generation), you may still find it valuable to skim through the other chapters. We’ve included exercises and code examples throughout the book, designed to help you get hands-on with the material. Try to complete these exercises as you go along, and where possible, see if you can adapt the examples to your use cases. Trying things out for yourself will help you build a much deeper understanding of the material. Finally, most chapter summaries list additional resources for further reading. We encourage you to explore these resources to deepen your understanding of the topics covered in the book. You don’t need to read these resources before you progress to a new chapter; you can come back later, whenever you are ready to go deeper into the subjects that interest you. Software and Hardware

Requirements To get the most out of this book, we highly recommend running the code examples as you read along. Experimenting with the code by making changes and exploring different scenarios will enhance your understanding. Working with transformers and diffusion models can be computationally intensive, so having access to a computer with an NVIDIA GPU is beneficial. While a GPU is not mandatory, it will significantly speed up training times. You can use any of multiple online options, such as Google Colaboratory and Kaggle Notebooks. Follow these instructions to set up your environment and follow along: Using Google Colab Most code should work on any Google Colab instance. We recommend you use GPU runtimes for chapters with training loops. Running code locally To run the code on your computer, create a Python 3.10 virtual environment using your preferred method. As an example, you can do it with conda like this: conda create -n genaibook python=3.10 conda activate genaibook For optimal performance, we recommend using a CUDA- compatible GPU.1 If you don’t know what CUDA is, don’t worry, we’ll explain it in the book. Many support utilities and helper functions are used throughout the book. To access them, please install the genaibook package:

pip install genaibook This will, in turn, install the libraries required to run transformers and diffusion models, along with PyTorch, matplotlib, numpy, and other essentials. All code examples and supplementary material can be found in the book’s GitHub repository. You can run all the examples interactively in Jupyter Notebooks, and the repository will be regularly updated with the latest resources. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. TIP This element signifies a tip or suggestion.

NOTE This element signifies a general note. WARNING This element indicates a warning or caution. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://oreil.ly/handsonGenAIcode. If you have a technical question or a problem using the code examples, please send email to support@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Hands-On Generative AI with Transformers and Diffusion Models by Omar Sanseviero,

Pedro Cuenca, Apolinário Passos, and Jonathan Whitaker (O’Reilly). Copyright 2025 Omar Sanseviero, Pedro Cuenca, Apolinário Passos, and Jonathan Whitaker, 978-1-098- 14924-6.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-889-8969 (in the United States or Canada) 707-827-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://oreilly.com/about/contact.html We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/handsonGenAI. For news and information about our books and courses, visit https://oreilly.com.

Find us on LinkedIn: https://linkedin.com/company/oreilly- media. Watch us on YouTube: https://youtube.com/oreillymedia. State of the Art: A Moving Target State of the art (SOTA) is used to describe the highest level of performance currently achieved in a particular task or domain. In the field of generative AI, the SOTA is constantly changing as new models are developed and new techniques are discovered. This book will provide you with a solid grounding in the fundamentals of generative AI, but by the time you read it, new models will have been released that outperform the ones we discuss here. Rather than trying to chase the ever-shifting best, we’ve tried to focus on general principles that will help you understand how the models work in a way that will be useful even as the field continues to evolve. New models rarely come out of nowhere and often build on the ideas of previous models. By understanding the fundamentals, you’ll be better equipped to understand the latest developments as they happen. Acknowledgments We would like to express our deepest gratitude to the incredible O’Reilly team, particularly Jill Leonard, for her amazing guidance and support throughout this entire process. Special thanks to Nicole Butterfield, Karen Montgomery, Kate Dullea, Gregory Hyman, and Kristen Brown for their invaluable advice and contributions, from

initial scoping to the creation of the beautiful cover and illustrations. We are deeply grateful to our technical reviewers: Vishwesh Ravi Shrimali, David Mertz, Lipi Deepaakshi Patnaik, Luba Elliott, Anil Sood, Sai Vuppalapati, Ranjeeta Bhattacharya, Rajat Dubey, Bryan Bischof, Vladislav Bilay, Gourav Singh Bais, Aditya Goel, Lakshmanan Sethu Sankaranarayanan, Zygmunt Lenyk, Youssef Hosn, Vicki Reyzelman, Lewis Tunstall, Sayak Paul, and Vaibhav Srivastav. Their insightful feedback was instrumental in shaping this book. We would also like to extend our gratitude to the Hugging Face team for their inspiration and collaboration, particularly Clémentine Fourrier for her insights on model evaluation, Sanchit Gandhi for his guidance on audio- related topics, and Leandro von Werra and Lewis Tunstall for helping us navigate the book-writing process. The Hugging Face team continues to inspire us with its brilliance and kindness, helping bring this project to life. A heartfelt thank you to the countless friends, collaborators, and contributors who have shaped the open- source ecosystem that we are proud to be part of. We are grateful to the entire ML community for advancing the research, tools, and resources that form the heart of this book. This work was crafted in Jupyter Notebooks, and we owe special thanks to Jeremy Howard, Hamel Husain, and all the contributors to Quarto and nbdev for making this possible. Jonathan I am very grateful to the community of researchers and hackers sharing their ideas and pushing forward what is

possible. To Jeremy Howard, Tanishq Abraham, and the rest of the fastdiffusion crew who came together to learn all we could about these ideas. And to my amazing coauthors, without whom this book could not have happened! Apolinário I am grateful to my coauthors Omar, Pedro and Jonathan for the co-creation of this book. Combining technology education and creativity has been a fun challenge to tackle. I thank my friends who understand and support me even when I come along to hang out carrying my laptop around and my Hugging Face colleagues for always being supportive. Pedro Writing a book is a lot of fun, but it unfairly exacts sacrifices from the people you love. I’m super lucky to have had the support of María José, my partner in life. She made it easy for me to work on it, and when I was stuck she helped with common sense reasoning that, frankly, is anything but common. I apologize to my Mom and Dad for always bringing my laptop when I visit, to my son Pablo for not exploring Hyrule or Eorzea as much as we’d have liked, and to my son Javier for sometimes talking too much about work and too little about life. They are the best. I’m truly inspired by my amazing coauthors. I admire and look up to them and can’t believe how lucky I am to learn from them, every day. This extends to the Hugging Face folks, whose enthusiasm and humility provide a primordial soup where things happen, and to the open ML community

at large, whose work is always advancing the field but not always getting the credit it deserves. Thank you. Omar Thank you, Michelle, for your constant encouragement throughout this process, for all the brainstorming sessions, and for your support over the past two years. I couldn’t have completed this project without you. Hikes are back on the table! To my parents, Ana and Walter, thank you for nurturing my love for books from the very beginning and for supporting me to become the person I am today. Lastly, I want to thank my amazing coauthors—Pedro, Poli, and Jonathan. This journey has been truly fun, and I’m so grateful that we accomplished this together. 1 Rather than GPU, you can also use the MPS device, which might work on Macs with Apple Silicon, but we have not tested this configuration extensively.

Hands-On Generative AI with Transformers and Diffusion Models (Omar Sanseviero, Pedro Cuenca etc.) (Z-Library)

AI Reading Assistant

Passage locations

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Recommended for You

AI Reading Assistant

Passage locations

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Recommended for You

Reply to Comment

Edit Comment