📄 Page
1
(This page has no text content)
📄 Page
2
Praise for Hands-On Large Language Models 赞扬动手操作大型语言模型 This is an exceptional guide to the world of language models and their practical applications in industry. Its highly-visual coverage of generative, representational, and retrieval applications of language models empowers readers to quickly understand, use, and refine LLMs. Highly recommended! 这是一本关于语言模型世界及其在工业中实际应用的杰出指南。 它对语言模型的生成、表示和检索应用的视觉覆盖范围很高,使 读者能够快速理解、使用和改进LLMs。强烈推荐! —Nils Reimers, Director of Machine Learning at Cohere | creator of sentence-transformers Nils Reimers,Cohere 机器学习总监 | sentence- transformers 的创造者 Jay and Maarten have continued their tradition of providing beautifully illustrated and insightful descriptions of complex topics in their new book. Bolstered with working code, timelines, and references to key papers, their book is a valuable resource for anyone looking to understand the main techniques behind how Large Language Models are built. 杰伊和马滕继续在他们的新书中提供精美插图和深入浅出的复杂 主题描述。书中配有工作代码、时间线和关键论文的引用,对于 想要了解大型语言模型构建主要技术的人来说,这是一本宝贵的 资源。 —Andrew Ng, founder of DeepLearning.AI 安德鲁·吴,DeepLearning.AI 创始人
📄 Page
3
I can’t think of another book that is more important to read right now. On every single page, I learned something that is critical to success in this era of language models. 我无法想象还有哪本书比现在阅读更重要。在每一页上,我都学 到了一些对于在这个语言模型时代取得成功至关重要的东西。 —Josh Starmer, StatQuest 乔什·斯塔默,StatQuest If you’re looking to get up to speed in everything regarding LLMs, look no further! In this wonderful book, Jay and Maarten will take you from zero to expert in the history and latest advances in large language models. With very intuitive explanations, great real-life examples, clear illustrations, and comprehensive code labs, this book lifts the curtain on the complexities of transformer models, tokenizers, semantic search, RAG, and many other cutting-edge technologies. A must read for anyone interested in the latest AI technology! 如果您想了解有关LLMs的所有内容并迅速掌握,那就无需再寻 找!在这本精彩的书里,Jay 和 Maarten 将带您从零开始,成为 大型语言模型历史和最新进展的专家。书中提供了非常直观的解 释、优秀的现实生活案例、清晰的插图和全面的代码实验室,揭 示了变压器模型、分词器、语义搜索、RAG 等许多尖端技术的复 杂性。对于任何对最新人工智能技术感兴趣的人来说,这是一本 必读之书! —Luis Serrano, PhD, Founder and CEO of Serrano Academy 路易斯·塞拉诺,博士,塞拉诺学院的创始人兼首席 执行官
📄 Page
4
This book is a must-read for anyone interested in the rapidly- evolving field of generative AI. With a focus on both text and visual embeddings, it’s a great blend of algorithmic evolution, theoretical rigor, and practical guidance. Whether you are a student, researcher, or industry professional, this book will equip you with the use cases and solutions needed to level-up your knowledge of generative AI. Well done! 这本书是任何对快速发展的生成式 AI 领域感兴趣的人必读之作。 它既关注文本又关注视觉嵌入,是算法演变、理论严谨和实践指 导的完美结合。无论你是学生、研究人员还是行业专业人士,这 本书都将为你提供提升生成式 AI 知识所需的应用案例和解决方 案。做得好! —Chris Fregly, Principal Solution Architect, Generative AI at AWS 克里斯·弗雷利,AWS 生成式 AI 的首席解决方案架 构师 In the heart of the GenAI revolution, this indispensable guide masterfully balances theory and practice, navigating the vast landscape of large language models to equip readers with the knowledge needed for immediate and transformative impact in the field of AI. 在 GenAI 革命的核心,这本不可或缺的指南巧妙地平衡了理论与 实践,引领读者穿越大型语言模型的广阔领域,为他们在 AI 领域 带来即时和变革性的影响提供所需的知识。 —Tarun Narayanan Venkatachalam, AI Researcher, University of Washington 塔伦·纳拉扬纳安·文卡塔查拉姆,人工智能研究员, 华盛顿大学
📄 Page
5
Timely reading to get hands-on experience with language models. 及时阅读以获得与语言模型动手实践的经验。 —Emir Muñoz, Genesys 埃米尔·穆诺兹,Genesys Hands-On Large Language Models brings clarity and practical examples to cut through the hype of AI. It provides a wealth of great diagrams and visual aids to supplement the clear explanations. The worked examples and code make concrete what other books leave abstract. The book starts with simple introductory beginnings, and steadily builds in scope. By the final chapters, you will be fine-tuning and building your own large language models with confidence. 《动手实践大型语言模型》为 AI 的炒作带来清晰和实用示例。它 提供了丰富的优秀图表和视觉辅助工具,以补充清晰的解释。工 作示例和代码使其他书籍留下的抽象内容具体化。本书从简单的 入门开始,逐步扩大范围。到最后一章,你将自信地微调和构建 自己的大型语言模型。 —Leland McInnes, Researcher at the Tutte Institute for Mathematics and Computing 李兰·麦克尼斯,图特数学与计算研究所研究员
📄 Page
6
Finally, a book that not only avoids superficial coverage of large language models but also thoroughly explores the background in a way that is both accessible and engaging. The authors have masterfully created a definitive guide that will remain essential reading despite the fast-paced advancements in the field. 最后,一本不仅避免了大型语言模型表面的覆盖,而且以既易于 理解又引人入胜的方式彻底探讨了其背景的书籍。作者们巧妙地 创造了一本权威指南,尽管该领域发展迅速,但它仍将是必读之 作。 —Prof. DDr. Roman Egger, CEO of Smartvisions.at and Modul University Vienna 罗马·埃格尔教授,Smartvisions.at 和维也纳模块大 学的 CEO
📄 Page
7
Hands-On Large Language Models 动手操作大型语言模型 Language Understanding and Generation 语言理解和生成 Jay Alammar and Maarten Grootendorst 杰伊·阿拉马尔和马滕·格罗滕多斯特
📄 Page
8
Hands-On Large Language Models 动手操作大型语言模型 by Jay Alammar and Maarten Grootendorst 由 Jay Alammar 和 Maarten Grootendorst 编写 Copyright © 2024 Jay Alammar and Maarten Pieter Grootendorst. All rights reserved. 版权所有 © 2024 Jay Alammar 和 Maarten Pieter Grootendorst。保 留所有权利。 Printed in the United States of America. 印刷在美国。 Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. 由 O'Reilly Media, Inc.出版,地址:1005 Gravenstein Highway North,Sebastopol,CA 95472。 O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. O'Reilly 书籍可用于教育、商业或销售促销。大多数标题也提供在线 版(http://oreilly.com)。如需更多信息,请联系我们的企业/机构 销售部门:800-998-9938 或 corporate@oreilly.com。 Acquisitions Editor: Nicole Butterfield 收购编辑:妮可·巴特菲尔德
📄 Page
9
Development Editor: Michele Cronin 开发编辑:Michele Cronin Production Editor: Ashley Stussy 生产编辑:Ashley Stussy Copyeditor: Charles Roumeliotis 校对编辑:查尔斯·鲁梅利奥蒂斯 Proofreader: Kim Cofer 校对员:Kim Cofer Indexer: BIM Creatives, LLC 索引器:BIM Creatives, LLC Interior Designer: David Futato 室内设计师:大卫·富塔托 Cover Designer: Karen Montgomery 封面设计师:Karen Montgomery Illustrator: Kate Dullea 插画家:凯特·杜利娅 September 2024: First Edition 2024 年 9 月:第一版
📄 Page
10
Revision History for the First Edition 修订历史 第一版 2024-09-10: First Release 2024-09-10:首次发布 See http://oreilly.com/catalog/errata.csp?isbn=9781098150969 for release details. 查看 http://oreilly.com/catalog/errata.csp?isbn=9781098150969 以 获取发布详情。 The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Hands-On Large Language Models, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. O'Reilly 标志是 O'Reilly Media, Inc.的注册商标。动手实践大型语言 模型、封面图片和相关商标是 O'Reilly Media, Inc.的商标。 The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 本作品中表达的观点为作者个人观点,不代表出版者的观点。虽然 出版者和作者已尽善意努力确保本作品中包含的信息和说明准确无 误,但出版者和作者对任何错误或遗漏概不承担责任,包括但不限 于因使用或依赖本作品而产生的损害赔偿责任。使用本作品中包含
📄 Page
11
的信息和说明由您自行承担风险。如果本作品中包含或描述的任何 代码示例或其他技术受开源许可证或其他人的知识产权保护,您有 责任确保您对其的使用符合此类许可证和/或权利。 978-1-098-15096-9 [LSI]
📄 Page
12
Preface 序言 Large language models (LLMs) have had a profound and far- reaching impact on the world. By enabling machines to better understand and generate human-like language, LLMs have opened new possibilities in the field of AI and impacted entire industries. 大型语言模型(LLMs)对世界产生了深刻而广泛的影响。通过使机 器更好地理解和生成类似人类的语言,LLMs为人工智能领域开辟了 新的可能性,并影响了整个行业。 This book provides a comprehensive and highly visual introduction to the world of LLMs, covering both the conceptual foundations and practical applications. From word representations that preceded deep learning to the cutting-edge (at the time of this writing) Transformer architecture, we will explore the history and evolution of LLMs. We delve into the inner workings of LLMs, exploring their architectures, training methods, and fine-tuning techniques. We also examine various applications of LLMs in text classification, clustering, topic modeling, chatbots, search engines, and more. 本书全面且直观地介绍了LLMs的世界,涵盖了概念基础和实际应 用。从先于深度学习的词表示到写作时的尖端 Transformer 架构,我 们将探讨LLMs的历史和演变。我们深入研究了LLMs的内部工作原 理,探讨了它们的架构、训练方法和微调技术。我们还考察了LLMs 在文本分类、聚类、主题建模、聊天机器人、搜索引擎等领域的各 种应用。 With its unique blend of intuition-building, applications, and illustrative style, we hope that this book provides the ideal foundation for those looking to explore the exciting world of LLMs.
📄 Page
13
Whether you are a beginner or an expert, we invite you to join us on this journey to start building with LLMs. 以其独特的直觉构建、应用和说明风格,我们希望这本书为那些想 要探索LLMs激动人心的世界的人提供理想的基础。无论你是初学者 还是专家,我们邀请您加入我们的旅程,开始用LLMs构建。 An Intuition-First Philosophy 直觉优先的哲学 The main goal of this book is to provide an intuition into the field of LLMs. The pace of development in the Language AI field is incredibly fast and frustration can build trying to keep up with the latest technologies. Instead, we focus on the fundamentals of LLMs and intend to provide a fun and easy learning process. 本书的主要目标是让读者对LLMs领域有一个直观的了解。语言 AI 领 域的发展速度非常快,试图跟上最新技术可能会让人感到沮丧。相 反,我们专注于LLMs的基础知识,并旨在提供一种有趣且容易的学 习过程。 To achieve this intuition-first philosophy we liberally make use of visual language. Illustrations will help give a visual identity to major concepts and processes involved in the learning process of LLMs.1 With our illustrative method of storytelling, we want to take you on a journey to this exciting and potentially world-changing field. 为了实现这种以直觉为先的哲学,我们自由地运用视觉语言。插图 将有助于赋予学习过程中涉及的主要概念和过程以视觉身份。 1 通 过我们的叙事插图方法,我们希望带您踏上这个激动人心且可能改 变世界的领域。 Throughout the book, we make a clear distinction between representation and generative language models. Representation models are LLMs that do not generate text but are commonly used
📄 Page
14
for task-specific use cases, like classification, whereas generation models are LLMs that generate text, like GPT models. Although generative models are typically the first thing that comes to mind when thinking about LLMs, there is still much use for representation models. We are also loosely using the word “large” in large language models and often elect to simply call them language models as size descriptions are often rather arbitrary and not always indicative of capability. 全书我们明确区分了表示和生成语言模型。表示模型是LLMs不生成 文本但常用于特定任务用例,如分类,而生成模型是LLMs生成文 本,如 GPT 模型。尽管在思考LLMs时,生成模型通常是首先想到 的,但表示模型仍有很大用途。我们也在大型语言模型中松散地使 用“大”这个词,并经常选择简单地称它们为语言模型,因为大小描述 通常是相当任意的,并不总是表明能力。 Prerequisites 先决条件 This book assumes that you have some experience programming in Python and are familiar with the fundamentals of machine learning. The focus will be on building a strong intuition rather than deriving mathematical equations. As such, illustrations combined with hands- on examples will drive the examples and learning through this book. This book assumes no prior knowledge of popular deep learning frameworks such as PyTorch or TensorFlow nor any prior knowledge of generative modeling. 本书假设您在 Python 编程方面有一些经验,并熟悉机器学习的基本 原理。重点将放在培养强烈的直觉上,而不是推导数学方程。因 此,结合实际案例的插图将推动本书的示例和学习。本书不假设您 对流行的深度学习框架(如 PyTorch 或 TensorFlow)或生成建模有 任何先前的知识。
📄 Page
15
If you are not familiar with Python, a great place to start is Learn Python, where you will find many tutorials on the basics of the language. To further ease the learning process, we made all the code available on Google Colab, a platform where you can run all of the code without the need to install anything locally. 如果您不熟悉 Python,一个很好的起点是学习 Python,在那里您可 以找到许多关于语言基础教程。为了进一步简化学习过程,我们在 Google Colab 上提供了所有代码,这是一个您可以运行所有代码而 无需本地安装任何内容的平台。
📄 Page
16
Book Structure 书籍结构 The book is broadly divided into three parts. They are illustrated in Figure P-1 to give you a full view of the book. Note that each chapter can be read independently, so feel free to skim chapters you are already familiar with. 本书分为三个部分。它们在图 P-1 中进行了说明,以便您全面了解 本书。请注意,每个章节都可以独立阅读,因此您可以自由地浏览 您已经熟悉的章节。
📄 Page
17
Part I: Understanding Language Models 第一部分:理解语言模型 In Part I of the book, we explore the inner workings of language models both small and large. We start with an overview of the field and common techniques (see Chapter 1) before moving over to two central components of these models, tokenization and embeddings (see Chapter 2). We finish this part of the book with an updated and expanded version of Jay’s well-known Illustrated Transformer, which dives into the architecture of these models (see Chapter 3). Many terms and definitions will be introduced that are used throughout the book. 本书第一部分,我们探讨了大小语言模型的内部运作。我们从该领 域的概述和常见技术(见第 1 章)开始,然后转向这些模型的核心 组件,标记化和嵌入(见第 2 章)。我们以 Jay 著名的《图解 Transformer》的更新和扩展版本结束这一部分,深入探讨这些模型 的架构(见第 3 章)。本书将介绍许多将在全书使用的术语和定 义。
📄 Page
18
Figure P-1. All parts and chapters of the book. 图 P-1. 书的所有部分和章节。
📄 Page
19
Part II: Using Pretrained Language Models 第二部分:使用预训练语言模型 In Part II of the book, we explore how LLMs can be used through common use cases. We use pretrained models and demonstrate their capabilities without the need to fine-tune them. 在本书的第二部分,我们探讨了如何通过常见用例使用LLMs。我们 使用预训练模型并展示了它们的能力,无需对其进行微调。 You learn how to use language models for supervised classification (see Chapter 4), text clustering and topic modeling (see Chapter 5), leveraging embedding models for semantic search (see Chapter 6), generating text (see Chapters 7 and 8), and extending the capabilities of text generation to the visual domain (see Chapter 9). 您学习如何使用语言模型进行监督分类(见第 4 章),文本聚类和 主题建模(见第 5 章),利用嵌入模型进行语义搜索(见第 6 章),生成文本(见第 7 章和第 8 章),以及扩展文本生成功能至 视觉领域(见第 9 章)。 Learning these individual language model capabilities will equip you with the skill set to problem-solve with LLMs and build more and more advanced systems and pipelines. 学习这些个别语言模型能力将使你具备使用 LLMs 解决问题的技能组 合,并构建更多更先进的系统和管道。
📄 Page
20
Part III: Training and Fine-Tuning Language Models 第三部分:训练和微调语言模型 In Part III of the book, we explore advanced concepts through training and fine-tuning all kinds of language models. We will explore how to create and fine-tune an embedding model (see Chapter 10), review how to fine-tune BERT for classification (see Chapter 11), and end the book with several methods for fine-tuning generation models (see Chapter 12). 在本书的第三部分,我们通过训练和微调各种语言模型来探讨高级 概念。我们将探讨如何创建和微调嵌入模型(见第 10 章),回顾如 何微调 BERT 进行分类(见第 11 章),并以几种微调生成模型的方 法结束本书(见第 12 章)。 Hardware and Software Requirements 硬件和软件要求 Running generative models is generally a compute-intensive task that requires a computer with a strong GPU. Since those are not available to every reader, all examples in this book are made to run using an online platform, namely Google Colaboratory, often shortened to “Google Colab.” At the time of writing, this platform allows you to use an NVIDIA GPU (T4) for free to run your code. This GPU has 16 GB of VRAM (which is the memory of your GPU), which is the minimum amount of VRAM we expect for the examples throughout the book. 运行生成模型通常是一个计算密集型任务,需要配备强大 GPU 的计 算机。由于并非每个读者都能获得这些设备,本书中的所有示例都 设计为使用在线平台运行,即 Google Colaboratory,通常简称为