Author:Ge Cheng
No description
Tags
Support Statistics
¥.00 ·
0times
Text Preview (First 20 pages)
Registered users can read the full content for free
Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.
Page
1
(This page has no text content)
Page
2
ChatGPT Principles and Architecture Ge Cheng Hunan National Applied Mathematics Center, Hunan, P.R. China School of Mathematics and Computational Science, Xiangtan University, Hunan, P.R. China
Page
3
Table of Contents Cover image Title page Copyright Preface Main Content of the Book Target Audience for This Book Contact the Author Acknowledgments Chapter 1. A new milestone in artificial intelligence— ChatGPT Abstract 1.1 The development history of ChatGPT 1.2 The capability level of ChatGPT
Page
4
1.3 The technical evolution of large language models 1.4 The technology stack of large language model 1.5 The impact of large language models 1.6 The challenges of training or deploying large models 1.7 The limitations of large language models 1.8 Summary Chapter 2. In-depth understanding of the transformer model Abstract 2.1 Introduction to the transformer model 2.2 Self-attention mechanism 2.3 Multihead attention mechanism 2.4 Feedforward neural network 2.5 Residual connection 2.6 Layer normalization 2.7 Position encoding
Page
5
2.8 Training and optimization 2.9 Summary Chapter 3. Generative pretraining Abstract 3.1 Introduction to generative pretraining 3.2 Generative pretraining model 3.3 The generative pretraining process 3.4 Supervised fine-tuning 3.5 Summary Chapter 4. Unsupervised multitask and zero-shot learning Abstract 4.1 Encoder and decoder 4.2 GPT-2 4.3 Unsupervised multitask learning 4.4 The relationship between multitask and zero- shot learning
Page
6
4.5 The autoregressive generation process of GPT- 2 4.6 Summary Chapter 5. Sparse attention and content-based learning Abstract 5.1 GPT-3 5.2 The sparse transformer 5.3 Meta-learning and in-context learning 5.4 Bayesian inference of concept distributions 5.5 Thought chains 5.6 Summary Chapter 6. Pretraining strategies for large language models Abstract 6.1 Pre-training datasets 6.2 Processing of pretraining data 6.3 Distributed training patterns
Page
7
6.4 Technical approaches to distributed training 6.5 Examples of training strategies 6.6 Summary Chapter 7. Proximal policy optimization Abstract 7.1 Traditional policy gradient methods 7.2 Actor-Critic 7.3 Trust region policy optimization 7.4 Principles of the proximal policy optimization algorithm 7.5 Summary Chapter 8. Human feedback reinforcement learning Abstract 8.1 Reinforcement learning in ChatGPT 8.2 InstructGPT training dataset 8.3 Training stages of human feedback reinforcement learning 8.4 Reward modeling algorithms
Page
8
8.5 PPO in InstructGPT 8.6 Multiturn dialogue capability 8.7 The necessity of human feedback reinforcement learning 8.8 Summary Chapter 9. Low-resource domain transfer of large language models Abstract 9.1 Self-instruct 9.2 Constitutional artificial intelligence 9.3 Low-rank adaptation 9.4 Quantization 9.5 SparseGPT 9.6 Case studies 9.7 Summary Chapter 10. Middleware Abstract 10.1 LangChain
Page
9
10.2 AutoGPT 10.3 Competitors in middleware frameworks 10.4 Summary Chapter 11. The future path of large language models Abstract 11.1 The path to strong artificial intelligence 11.2 Data resource depletion 11.3 Limitations of autoregressive models 11.4 Embodied intelligence 11.5 Summary Index
Page
10
Copyright Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands 125 London Wall, London EC2Y 5AS, United Kingdom 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States Copyright © 2025 China Machine Press Co., Ltd. Published by Elsevier Inc. All rights are reserved, including those for text and data mining, AI training, and similar technologies. For accessibility purposes, images in electronic versions of this book are accompanied by alt text descriptions provided by Elsevier. For more information, see https://www.elsevier.com/about/accessibility. Publisher’s note: Elsevier takes a neutral position with respect to territorial disputes or jurisdictional claims in its published content, including in maps and institutional affiliations. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or
Page
11
mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability
Page
12
for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. ISBN: 978-0-443-27436-7 For Information on all Elsevier publications visit our website at https://www.elsevier.com/books-and- journals Publisher: Mara Conner Acquisitions Editor: Glyn Jones Editorial Project Manager: Naomi Robertson Production Project Manager: Selvaraj Raviraj Cover Designer: Greg Harris Typeset by MPS Limited, Chennai, India
Page
13
Preface As a university computer science researcher and a veteran entrepreneur, I was profoundly impressed by experiencing firsthand the logical reasoning capabilities emerging from ChatGPT. Although many celebrate the efficiency enhancements in multimodal content creation brought by generative artificial intelligence (AI), the reasoning abilities displayed by ChatGPT are often underestimated. This capability enables ChatGPT to serve not only as the core of a new generation of human–computer interaction but also as an intelligent agent to build automated and semiautomated workflows. It can even merge with industrial control or robotics fields, thereby triggering profound social changes. Many underestimate the impact of this transformation. Given the current pace of R&D and commercial application iterations, I expect that this transformation will gradually permeate all aspects of human life and production over the next 3–5 years, greatly enhancing existing productivity and thereby initiating a series of changes. If asked to pinpoint the last era called a “major technological transformation,” many would unhesitatingly refer to the dawn of the internet. This transformation will also reshape business
Page
14
models related to content production, change existing work methods, and even drive changes in production methods. Of course, this still depends on whether the next generation of large language models can achieve breakthroughs in controllability of content output. Main Content of the Book This book is designed to help readers deeply understand ChatGPT and its related technologies. It consists of 11 chapters that comprehensively explore various aspects. Chapter 1 provides an in-depth analysis of the technological evolution of large language models, supporting technologies, and technology stacks, and discusses their significant impact on society. Chapter 2 elaborates on the theoretical foundations and main components of the Transformer model, revealing the principles and applications behind these technologies. Chapter 3 delves into the generative pretraining process and principles of GPT. Chapter 4 primarily explores technologies such as layer normalization, orthogonal initialization, and reversible tokenization in GPT-2, and provides a detailed analysis of GPT-2 autoregressive generation process. Chapter 5 introduces GPT-3 sparse attention mechanisms, metalearning, and content-based learning concepts, and discusses the application of Bayesian inference in conceptual distributions.
Page
15
Chapter 6 details the pretraining datasets and data processing methods for large language models, as well as distributed training models and architectures. Chapter 7 deeply analyzes the fundamental principles of the proximal policy optimization (PPO) algorithm. Chapter 8 focuses on the fine-tuning datasets of reinforcement learning with human feedback (RLHF) and the application of PPO in InstructGPT, discussing the capabilities of multiturn dialog and the necessity of human feedback reinforcement learning. Chapter 9 explores how to transfer large language models to specific domains at low resource costs. Chapter 10 primarily introduces the middleware technologies involved in the development of large language models. Chapter 11 predicts and prospects the future development trends of large language models. Target Audience for This Book • Product managers in the AI field: For product managers looking to incorporate AI features into their products, understanding the basic principles and operational mechanisms of large language models like ChatGPT is crucial. From this book, they can learn about the design philosophies and construction methods of large language models, as well as how to integrate these models into their products. They can also better understand the
Page
16
performance bottlenecks of their products, which aids in more precise product planning. • Researchers in AI-related fields: For AI researchers, this book can serve as a textbook for a deep understanding of large language models. Whether it's the details of the Transformer model or tips on training and optimizing GPT models, this book provides thorough explanations. More importantly, this book explores some of the cutting- edge research areas, such as human feedback reinforcement learning and bootstrap labeling algorithms. • Engineers specializing in large-scale data processing and analysis: For engineers facing challenges such as efficiently processing large-scale data or building distributed training architectures, this book offers many valuable suggestions and ideas. For example, Chapter 6 delves deeply into data processing and distributed training patterns. • AI enthusiasts and technologically savvy individuals in everyday life: If you are an AI technology enthusiast or someone who uses technology to improve everyday life, this book is also suitable for you. The introduction to large language models in this book is easy to understand, providing a comprehensive overview of this powerful technology. More interestingly, this book offers
Page
17
many practical usage tips and case studies that can be directly applied to your life or work. Contact the Author Given my limited writing skills, there are inevitably some inadequacies in this book. If you have any questions or suggestions during your reading, you can contact me via email at chenggextu@hotmail.com. I greatly look forward to your feedback, as it will be immensely helpful for my future writing. I hope you gain profound insights and deepen your understanding of large language models and AI while reading this book.
Page
18
Acknowledgments First, I would like to thank my family. During the writing of this book, the time spent with them was greatly reduced, but they always provided support and understanding, allowing me to fully dedicate myself to writing without any concerns. I am grateful to the editors, Yang Fuchuan and Chen Jie. The smooth publication of this book would not have been possible without their professionalism and meticulous work attitude. Finally, I must thank my graduate students Yin Zhibin, Luo Qifan, Yu Zhiwen, Yu Jiangnan, and Yang Jin. They created numerous illustrations for this book, and I am sincerely grateful for their contributions! Ge Cheng
Page
19
CHAPTER 1 A new milestone in artificial intelligence— ChatGPT Abstract This chapter provides an in-depth exploration of the evolution, capabilities, and impact of ChatGPT, a groundbreaking artificial intelligence (AI) application developed by OpenAI. The chapter traces the development history of ChatGPT from its early predecessors like GPT-1 and GPT-2 to the more advanced GPT-3 and GPT-4 models. It highlights the technological advancements that have made ChatGPT a powerful tool capable of complex tasks such as language comprehension, code generation, and multimodal reasoning. The chapter also discusses the architecture underpinning large language models (LLMs), focusing on the transition from traditional natural language processing techniques to Transformer-based models. Additionally, it addresses the significant computational and data requirements
Page
20
for training these models, alongside challenges such as model interpretability, biases, and privacy concerns. The chapter concludes by examining the broader implications of LLMs across various sectors and anticipates future trends in AI development. Keywords ChatGPT; OpenAI; transformer model; natural language processing (NLP); artificial general intelligence (AGI); GPT series; human feedback reinforcement learning (HFRL); large language models (LLMs); computational power; model limitations In November 2022, OpenAI introduced ChatGPT, an artificial intelligence (AI)-powered chat application with diverse functionalities. This application displays intellectual prowess that rivals and sometimes surpasses human performance across numerous professional and academic measures. Upon its launch, ChatGPT garnered widespread acclaim within the technology sector, marking a significant advancement in AI. 1.1 The development history of ChatGPT For more than 50 years, the quest to enable computers to communicate like humans has been a significant focus in technology. The general public is acquainted with chat applications, tracing back to 1966 when MIT Professor
The above is a preview of the first 20 pages. Register to read the complete e-book.
Comments 0
Loading comments...
Reply to Comment
Edit Comment