📄 Page
1
(This page has no text content)
📄 Page
2
DATA “The absolute best book-length resource I’ve read on prompt engineering. Mike and James are masters of their craft.” —Dan Shipper Cofounder & CEO, Every “If you’re looking to improve the accuracy and reliability of your AI systems, this book should be on your shelf.” —Mayo Oshin Founder & CEO, Siennai Analytics, early LangChain contributor linkedin.com/company/oreilly-media youtube.com/oreillymedia Large language models (LLMs) and diffusion models such as ChatGPT and DALL-E have unprecedented potential. Having been trained on public text and images on the internet, they can contribute to a wide variety of tasks. And with the barrier to entry greatly reduced, practically any developer can harness AI models to tackle problems previously unsuitable for automation. With this book, you’ll gain a solid foundation in generative AI and learn how to apply these models in practice. When integrating LLMs and diffusion models into their workflows, most developers struggle to coax out reliable results for use in automated systems. Authors James Phoenix and Mike Taylor show you how prompt engineering principles will enable you to work effectively with AI in production. This book explains: • The five principles of prompting that are transferable across models, and will continue to work in the future • Applying generative AI to real-world examples using libraries and frameworks such as LangChain • Evaluating OpenAI models such as GPT-4 and DALL-E 2 against alternatives, including open-source models, comparing strengths and weaknesses • How these principles apply in practice in the domains of NLP, text and image generation, and code James Phoenix has taught more than 60 data science bootcamps for General Assembly. Mike Taylor created the marketing agency Ladder, employing 50 people in the USA, UK, and EU. James and Mike teach generative AI courses through their company Vexpower. 9 7 8 1 0 9 8 1 5 3 4 3 4 5 7 9 9 9 US $79.99 CAN $99.99 ISBN: 978-1-098-15343-4 Prompt Engineering for Generative AI
📄 Page
3
Praise for Prompt Engineering for Generative AI The absolute best book-length resource I’ve read on prompt engineering. Mike and James are masters of their craft. —Dan Shipper, cofounder and CEO, Every This book is a solid introduction to the fundamentals of prompt engineering and generative AI. The authors cover a wide range of useful techniques for all skill levels from beginner to advanced in a simple, practical, and easy-to-understand way. If you’re looking to improve the accuracy and reliability of your AI systems, this book should be on your shelf. —Mayo Oshin, founder and CEO, Siennai Analytics, early LangChain contributor Phoenix and Taylor’s guide is a lighthouse amidst the vast ocean of generative AI. Their book became a cornerstone for my team at Phiture AI Labs, as we learned to harness LLMs and diffusion models for creating marketing assets that resonate with the essence of our clients’ apps and games. Through prompt engineering, we’ve been able to generate bespoke, on-brand content at scale. This isn’t just theory; it’s a practical masterclass in transforming AI’s raw potential into tailored solutions, making it an essential read for developers looking to elevate their AI integration to new heights of creativity and efficiency. —Moritz Daan, Founder/Partner, Phiture Mobile Growth Consultancy
📄 Page
4
Prompt Engineering for Generative AI is probably the most future-proof way of future-proofing your tech career. This is without a doubt the best resource for anyone working in practical applications of AI. The rich, refined principles in here will help both new and seasoned AI engineers stay on top of this very competitive game for the foreseeable future. —Ellis Crosby, CTO and cofounder, Incremento This is an essential guide for agency and service professionals. Integrating AI with service and client delivery, using automation management, and speeding up solutions will set new industry standards. You’ll find useful, practical information and tactics in the book, allowing you to understand and utilize AI to its full potential. —Byron Tassoni-Resch, CEO and cofounder, WeDiscover A really interesting and informative read, mixing practical tips and tricks with some solid foundational information. The world of GenAI is developing at breakneck speed, and having a toolset that can deliver results, regardless of the foundational model being used, is worth its weight in gold! —Riaan Dreyer, chief digital and data officer, Bank of Iceland The authors expertly translate prompt engineering intricacies into a practical toolkit for text and image generation. This guide, spanning standard practices to cutting-edge techniques, empowers readers with practical tips to maximize generative AI model capabilities. —Aditya Goel, generative AI consultant
📄 Page
5
James Phoenix and Mike Taylor Prompt Engineering for Generative AI Future-Proof Inputs for Reliable AI Outputs at Scale Boston Farnham Sebastopol TokyoBeijing
📄 Page
6
978-1-098-15343-4 [LSI] Prompt Engineering for Generative AI by James Phoenix and Mike Taylor Copyright © 2024 Saxifrage, LLC and Just Understanding Data LTD. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Nicole Butterfield Development Editor: Corbin Collins Copyeditor: Piper Editorial Consulting, LLC Proofreader: Kim Wimpsett Indexer: nSight, Inc. Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea May 2024: First Edition Revision History for the First Edition 2024-05-15: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098153434 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Prompt Engineering for Generative AI, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
📄 Page
7
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1. The Five Principles of Prompting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Overview of the Five Principles of Prompting 4 1. Give Direction 8 2. Specify Format 14 3. Provide Examples 17 4. Evaluate Quality 20 5. Divide Labor 32 Summary 38 2. Introduction to Large Language Models for Text Generation. . . . . . . . . . . . . . . . . . . . . . 41 What Are Text Generation Models? 41 Vector Representations: The Numerical Essence of Language 42 Transformer Architecture: Orchestrating Contextual Relationships 43 Probabilistic Text Generation: The Decision Mechanism 45 Historical Underpinnings: The Rise of Transformer Architectures 46 OpenAI’s Generative Pretrained Transformers 48 GPT-3.5-turbo and ChatGPT 48 GPT-4 51 Google’s Gemini 51 Meta’s Llama and Open Source 52 Leveraging Quantization and LoRA 53 Mistral 53 Anthropic: Claude 54 GPT-4V(ision) 54 Model Comparison 54 Summary 55 v
📄 Page
8
3. Standard Practices for Text Generation with ChatGPT. . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Generating Lists 57 Hierarchical List Generation 59 When to Avoid Using Regular Expressions 64 Generating JSON 64 YAML 66 Filtering YAML Payloads 67 Handling Invalid Payloads in YAML 69 Diverse Format Generation with ChatGPT 72 Mock CSV Data 73 Explain It like I’m Five 74 Universal Translation Through LLMs 74 Ask for Context 76 Text Style Unbundling 80 Identifying the Desired Textual Features 80 Generating New Content with the Extracted Features 82 Extracting Specific Textual Features with LLMs 82 Summarization 82 Summarizing Given Context Window Limitations 83 Chunking Text 85 Benefits of Chunking Text 86 Scenarios for Chunking Text 86 Poor Chunking Example 87 Chunking Strategies 89 Sentence Detection Using SpaCy 90 Building a Simple Chunking Algorithm in Python 91 Sliding Window Chunking 92 Text Chunking Packages 94 Text Chunking with Tiktoken 95 Encodings 95 Understanding the Tokenization of Strings 95 Estimating Token Usage for Chat API Calls 97 Sentiment Analysis 99 Techniques for Improving Sentiment Analysis 101 Limitations and Challenges in Sentiment Analysis 101 Least to Most 102 Planning the Architecture 102 Coding Individual Functions 102 Adding Tests 103 Benefits of the Least to Most Technique 104 Challenges with the Least to Most Technique 105 Role Prompting 105 vi | Table of Contents
📄 Page
9
Benefits of Role Prompting 106 Challenges of Role Prompting 106 When to Use Role Prompting 107 GPT Prompting Tactics 107 Avoiding Hallucinations with Reference 108 Give GPTs “Thinking Time” 110 The Inner Monologue Tactic 111 Self-Eval LLM Responses 112 Classification with LLMs 114 Building a Classification Model 115 Majority Vote for Classification 116 Criteria Evaluation 117 Meta Prompting 120 Summary 124 4. Advanced Techniques for Text Generation with LangChain. . . . . . . . . . . . . . . . . . . . . . 125 Introduction to LangChain 125 Environment Setup 127 Chat Models 128 Streaming Chat Models 130 Creating Multiple LLM Generations 130 LangChain Prompt Templates 131 LangChain Expression Language (LCEL) 132 Using PromptTemplate with Chat Models 134 Output Parsers 134 LangChain Evals 138 OpenAI Function Calling 146 Parallel Function Calling 149 Function Calling in LangChain 151 Extracting Data with LangChain 152 Query Planning 153 Creating Few-Shot Prompt Templates 154 Fixed-Length Few-Shot Examples 155 Formatting the Examples 155 Selecting Few-Shot Examples by Length 157 Limitations with Few-Shot Examples 159 Saving and Loading LLM Prompts 159 Data Connection 161 Document Loaders 162 Text Splitters 165 Text Splitting by Length and Token Size 165 Text Splitting with Recursive Character Splitting 167 Table of Contents | vii
📄 Page
10
Task Decomposition 169 Prompt Chaining 171 Sequential Chain 172 itemgetter and Dictionary Key Extraction 173 Structuring LCEL Chains 178 Document Chains 179 Stuff 181 Refine 182 Map Reduce 182 Map Re-rank 183 Summary 184 5. Vector Databases with FAISS and Pinecone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Retrieval Augmented Generation (RAG) 188 Introducing Embeddings 189 Document Loading 198 Memory Retrieval with FAISS 201 RAG with LangChain 206 Hosted Vector Databases with Pinecone 207 Self-Querying 215 Alternative Retrieval Mechanisms 220 Summary 221 6. Autonomous Agents with Memory and Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Chain-of-Thought 223 Agents 225 Reason and Act (ReAct) 227 Reason and Act Implementation 229 Using Tools 235 Using LLMs as an API (OpenAI Functions) 237 Comparing OpenAI Functions and ReAct 241 Use Cases for OpenAI Functions 242 ReAct 242 Use Cases for ReAct 243 Agent Toolkits 243 Customizing Standard Agents 245 Custom Agents in LCEL 246 Understanding and Using Memory 249 Long-Term Memory 249 Short-Term Memory 249 Short-Term Memory in QA Conversation Agents 250 Memory in LangChain 250 viii | Table of Contents
📄 Page
11
Preserving the State 251 Querying the State 252 ConversationBufferMemory 252 Other Popular Memory Types in LangChain 255 ConversationBufferWindowMemory 255 ConversationSummaryMemory 255 ConversationSummaryBufferMemory 256 ConversationTokenBufferMemory 256 OpenAI Functions Agent with Memory 257 Advanced Agent Frameworks 259 Plan-and-Execute Agents 259 Tree of Thoughts 260 Callbacks 261 Global (Constructor) Callbacks 263 Request-Specific Callbacks 264 The Verbose Argument 264 When to Use Which? 264 Token Counting with LangChain 264 Summary 266 7. Introduction to Diffusion Models for Image Generation. . . . . . . . . . . . . . . . . . . . . . . . . 267 OpenAI DALL-E 271 Midjourney 273 Stable Diffusion 277 Google Gemini 279 Text to Video 279 Model Comparison 279 Summary 280 8. Standard Practices for Image Generation with Midjourney. . . . . . . . . . . . . . . . . . . . . . 283 Format Modifiers 283 Art Style Modifiers 287 Reverse Engineering Prompts 289 Quality Boosters 290 Negative Prompts 292 Weighted Terms 294 Prompting with an Image 297 Inpainting 300 Outpainting 303 Consistent Characters 305 Prompt Rewriting 307 Meme Unbundling 310 Table of Contents | ix
📄 Page
12
Meme Mapping 315 Prompt Analysis 317 Summary 318 9. Advanced Techniques for Image Generation with Stable Diffusion. . . . . . . . . . . . . . . . 319 Running Stable Diffusion 319 AUTOMATIC1111 Web User Interface 326 Img2Img 334 Upscaling Images 337 Interrogate CLIP 339 SD Inpainting and Outpainting 340 ControlNet 343 Segment Anything Model (SAM) 352 DreamBooth Fine-Tuning 355 Stable Diffusion XL Refiner 362 Summary 365 10. Building AI-Powered Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 AI Blog Writing 367 Topic Research 368 Expert Interview 371 Generate Outline 373 Text Generation 374 Writing Style 377 Title Optimization 380 AI Blog Images 381 User Interface 387 Summary 389 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 x | Table of Contents
📄 Page
13
Preface The rapid pace of innovation in generative AI promises to change how we live and work, but it’s getting increasingly difficult to keep up. The number of AI papers published on arXiv is growing exponentially, Stable Diffusion has been among the fastest growing open source projects in history, and AI art tool Midjourney’s Discord server has tens of millions of members, surpassing even the largest gaming communi‐ ties. What most captured the public’s imagination was OpenAI’s release of ChatGPT, which reached 100 million users in two months, making it the fastest-growing con‐ sumer app in history. Learning to work with AI has quickly become one of the most in-demand skills. Everyone using AI professionally quickly learns that the quality of the output depends heavily on what you provide as input. The discipline of prompt engineering has arisen as a set of best practices for improving the reliability, efficiency, and accuracy of AI models. “In ten years, half of the world’s jobs will be in prompt engineering,” claims Robin Li, the cofounder and CEO of Chinese tech giant Baidu. However, we expect prompting to be a skill required of many jobs, akin to proficiency in Microsoft Excel, rather than a popular job title in itself. This new wave of disrup‐ tion is changing everything we thought we knew about computers. We’re used to writing algorithms that return the same result every time—not so for AI, where the responses are non-deterministic. Cost and latency are real factors again, after decades of Moore’s law making us complacent in expecting real-time computation at negligible cost. The biggest hurdle is the tendency of these models to confidently make things up, dubbed hallucination, causing us to rethink the way we evaluate the accuracy of our work. We’ve been working with generative AI since the GPT-3 beta in 2020, and as we saw the models progress, many early prompting tricks and hacks became no longer necessary. Over time a consistent set of principles emerged that were still useful with the newer models, and worked across both text and image generation. We have written this book based on these timeless principles, helping you learn transferable skills that will continue to be useful no matter what happens with AI over the next xi
📄 Page
14
five years. The key to working with AI isn’t “figuring out how to hack the prompt by adding one magic word to the end that changes everything else,” as OpenAI cofounder Sam Altman asserts, but what will always matter is the “quality of ideas and the understanding of what you want.” While we don’t know if we’ll call it “prompt engineering” in five years, working effectively with generative AI will only become more important. Software Requirements for This Book All of the code in this book is in Python and was designed to be run in a Jupyter Notebook or Google Colab notebook. The concepts taught in the book are transfer‐ able to JavaScript or any other coding language if preferred, though the primary focus of this book is on prompting techniques rather than traditional coding skills. The code can all be found on GitHub, and we will link to the relevant notebooks throughout. It’s highly recommended that you utilize the GitHub repository and run the provided examples while reading the book. For non-notebook examples, you can run the script with the format python con tent/chapter_x/script.py in your terminal, where x is the chapter number and script.py is the name of the script. In some instances, API keys need to be set as environment variables, and we will make that clear. The packages used update frequently, so install our requirements.txt in a virtual environment before running code examples. The requirements.txt file is generated for Python 3.9. If you want to use a different version of Python, you can generate a new requirements.txt from this requirements.in file found within the GitHub repository, by running these commands: `pip install pip-tools` `pip-compile requirements.in` For Mac users: 1. Open Terminal: You can find the Terminal application in your Applications folder, under Utilities, or use Spotlight to search for it. 2. Navigate to your project folder: Use the cd command to change the directory to your project folder. For example: cd path/to/your/project. 3. Create the virtual environment: Use the following command to create a virtual environment named venv (you can name it anything): python3 -m venv venv. 4. Activate the virtual environment: Before you install packages, you need to acti‐ vate the virtual environment. Do this with the command source venv/bin/acti vate. xii | Preface
📄 Page
15
5. Install packages: Now that your virtual environment is active, you can install packages using pip. To install packages from the requirements.txt file, use pip install -r requirements.txt. 6. Deactivate virtual environment: When you’re done, you can deactivate the virtual environment by typing deactivate. For Windows users: 1. Open Command Prompt: You can search for cmd in the Start menu. 2. Navigate to your project folder: Use the cd command to change the directory to your project folder. For example: cd path\to\your\project. 3. Create the virtual environment: Use the following command to create a virtual environment named venv: python -m venv venv. 4. Activate the virtual environment: To activate the virtual environment on Win‐ dows, use .\venv\Scripts\activate. 5. Install packages: With the virtual environment active, install the required pack‐ ages: pip install -r requirements.txt. 6. Deactivate the virtual environment: To exit the virtual environment, simply type: deactivate. Here are some additional tips on setup: • Always ensure your Python is up-to-date to avoid compatibility issues. • Remember to activate your virtual environment whenever you work on the project. • The requirements.txt file should be in the same directory where you create your virtual environment, or you should specify the path to it when using pip install -r. Access to an OpenAI developer account is assumed, as your OPENAI_API_KEY must be set as an environment variable in any examples importing the OpenAI library, for which we use version 1.0. Quick-start instructions for setting up your development environment can be found in OpenAI’s documentation on their website. You must also ensure that billing is enabled on your OpenAI account and that a valid payment method is attached to run some of the code within the book. The examples in the book use GPT-4 where not stated, though we do briefly cover Anthropic’s competing Claude 3 model, as well as Meta’s open source Llama 3 and Google Gemini. Preface | xiii
📄 Page
16
For image generation we use Midjourney, for which you need a Discord account to sign up, though these principles apply equally to DALL-E 3 (available with a ChatGPT Plus subscription or via the API) or Stable Diffusion (available as an API or it can run locally on your computer if it has a GPU). The image generation examples in this book use Midjourney v6, Stable Diffusion v1.5 (as many extensions are still only compatible with this version), or Stable Diffusion XL, and we specify the differences when this is important. We provide examples using open source libraries wherever possible, though we do include commercial vendors where appropriate—for example, Chapter 5 on vector databases demonstrates both FAISS (an open source library) and Pinecone (a paid vendor). The examples demonstrated in the book should be easily modifiable for alternative models and vendors, and the skills taught are transferable. Chapter 4 on advanced text generation is focused on the LLM framework LangChain, and Chapter 9 on advanced image generation is built on AUTOMATIC1111’s open source Stable Diffusion Web UI. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. xiv | Preface
📄 Page
17
This element signifies a general note. This element indicates a warning or caution. Throughout the book we reinforce what we call the Five Principles of Prompting, identifying which principle is most applicable to the example at hand. You may want to refer to Chapter 1, which describes the principles in detail. Principle Name This will explain how the principle is applied to the current exam‐ ple or section of text. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://oreil.ly/prompt-engineering-for-generative-ai. If you have a technical question or a problem using the code examples, please send email to bookquestions@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Prompt Engineering for Generative AI by James Phoenix and Mike Taylor (O’Reilly). Copyright 2024 Saxifrage, LLC and Just Understanding Data LTD, 978-1-098-15343-4.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. Preface | xv
📄 Page
18
O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-889-8969 (in the United States or Canada) 707-827-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://www.oreilly.com/about/contact.html We have a web page for this book, where we list errata, examples, and any addi‐ tional information. You can access this page at https://oreil.ly/prompt-engineering- generativeAI. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media. Watch us on YouTube: https://youtube.com/oreillymedia. Acknowledgments We’d like to thank the following people for their contribution in conducting a techni‐ cal review of the book and their patience in correcting a fast-moving target: • Mayo Oshin, early LangChain contributor and founder at SeinnAI Analytics • Ellis Crosby, founder at Scarlett Panda and AI agency Incremen.to • Dave Pawson, O’Reilly author of XSL-FO xvi | Preface
📄 Page
19
• Mark Phoenix, a senior software engineer • Aditya Goel, GenAI consultant We are also grateful to our families for their patience and understanding and would like to reassure them that we still prefer talking to them over ChatGPT. Preface | xvii
📄 Page
20
(This page has no text content)