(This page has no text content)
Python GPT Cookbook 75+ practical recipes for building NLP solutions for the real world Dr. Neil Williams www.bpbonline.com
First Edition 2025 Copyright © BPB Publications, India ISBN: 978-93-65892-062 All Rights Reserved. No part of this publication may be reproduced, distributed or transmitted in any form or by any means or stored in a database or retrieval system, without the prior written permission of the publisher with the exception to the program listings which may be entered, stored and executed in a computer system, but they can not be reproduced by the means of publication, photocopy, recording, or by any electronic and mechanical means. LIMITS OF LIABILITY AND DISCLAIMER OF WARRANTY The information contained in this book is true to correct and the best of author’s and publisher’s knowledge. The author has made every effort to ensure the accuracy of these publications, but publisher cannot be held responsible for any loss or damage arising from any information in this book. All trademarks referred to in the book are acknowledged as properties of their respective owners but BPB Publications cannot guarantee the accuracy of this information. www.bpbonline.com
Dedicated to My family and friends
About the Author Dr. Neil Williams is a data scientist and AI strategist with a passion for transforming complex challenges into actionable insights. With a background spanning econometrics, machine learning, and Gen AI, he specializes in leveraging Python, Elastic, and LLMs to build scalable, high- impact solutions. Neil has worked as a consultant for nearly 30 years; his client engagements include Volkswagen, National Grid, Royal Mail, Capgemini, The R&A, Man City FC, Elexon, UKRI, Electricity Company of Ghana, Duni, BIM Object, and many others. From guiding enterprises through AI adoption to architecting intelligent risk management platforms, Neil combines deep technical expertise with sharp business acumen. His work spans industries (majoring in energy), where he has championed data-driven decision-making. Whether pioneering new applications of GPT in enterprise environments or mentoring the next generation of data scientists, he thrives at the intersection of technology, strategy, and innovation. In 1998, Neil was awarded a PhD for his research in computer vision applied to subsea robotics. His broad experience and academic rigour have given him a deep understanding of the technical aspects of AI and its application across different industries.
About the Reviewer Naresh Dulam is a visionary technology leader with deep expertise in data analytics, cloud computing, and artificial intelligence. With a career spanning influential roles across the healthcare, telecom, and financial sectors, he has led transformative analytical platforms that deliver sustainable and impactful solutions to address evolving industry needs. As a forward-thinking innovator, Naresh combines technical expertise with a passion for knowledge-sharing, mentoring aspiring professionals, and fostering ethical practices. His commitment to driving innovation and creating meaningful impact is matched by his vision of inspiring the next generation of technologists. Beyond his professional achievements, Naresh enjoys exploring nature on adventurous expeditions and empowering others to achieve their full potential.
Acknowledgement Thank you to everyone who has played a part in making the Python GPT Cookbook a reality. Several people have contributed to the completion of this book, which was by no means a solo effort. First and foremost, I extend my heartfelt appreciation to my family and friends for their unwavering support throughout this journey. Their love and encouragement have been a constant source of motivation. Special thanks to my close business associates. Your insights and feedback have been instrumental in shaping the content and improving the quality of this book. I am immensely grateful to BPB Publications for their guidance and expertise in bringing this book to fruition. Their support and assistance were invaluable in navigating the complexities of the publishing process. The reviewers, technical experts, and editors who provided valuable feedback and contributed to the refinement of this manuscript truly deserve a shout-out. Your insights and suggestions have significantly enhanced the quality of the book. Finally, I want to express my gratitude to the people I have met at networking events who have shown interest in my book. Your support and feedback have been deeply appreciated.
Preface Through practical examples, useful explanations, and a structured approach, this book aims to equip readers with a solid understanding of creating solutions with GPT and Python. It is a practical guide designed to cater to business professionals, including product managers, data scientists, machine learning engineers, NLP engineers, and ML researchers with prior knowledge of Python and some understanding of machine learning. Each chapter includes recipes and practical tips to help readers deepen their understanding and apply concepts. By the end of the book, readers will have gained a strong understanding of GPT and how to apply the technology to real-world natural language processing tasks, e.g., text summarization, content generation, programming, chatbots, and conversational agents Logically, the book is organized into the following parts: 1. Basics: Chapters 1 and 2 provide background information and introduce some of the main tools needed throughout the book. 2. Foundations: Chapters 3 to 8 focus on the key components of GPT- powered solutions. 3. GPT in the Field: Chapters 9 to 13 form the heart of the book. It includes code examples and practical tips for a variety of domains, including manufacturing, marketing, sales, and intellectual property law. 4. Conclusions: The book is wrapped up in the final two chapters, 14 and 15. Chapter 1: Introduction to GPT - Introduces the history of GPT, explores OpenAI’s developer platform, and sets the stage for the rest of the book.
Chapter 2: Crafting Your GPT Workspace - Provides a step-by-step guide to setting up a Python development environment, including Visual Studio Code and Jupyter. Hugging Face and spaCy are introduced. Chapter 3: Pre-processing - This covers various techniques for preparing text data for embedding in GPT models. Topics are tokenization, lower- casing, encoding, lemmatization, etc. Chapter 4: Embeddings - Explains how transforming discrete data, such as words or items, into continuous vectors plays a significant role in natural language processing and GPT. Chapter 5: Classifying Intent - This shows how to teach machines to understand intent—a cornerstone of conversational AI and smart automation. Chapter 6: Hugging Face and GPT - Looks deeper into Hugging Face - a platform that has become an indispensable tool for working with OpenAI's GPT models. Although Hugging Face's Transformers library is introduced in previous chapters, it is here that we will plunge into a more detailed exploration of its features and functionalities. Chapter 7: Vector Databases - Explores vector databases, which are instrumental in information retrieval tasks and in solutions requiring similarity search. Chapter 8: GPT, PyTorch, and TensorFlow - Looks at two open-source software libraries that are critical for building, training, and deploying GPT models. Chapter 9: Custom GPT Actions - Introduces how to develop Custom GPT Actions, including various techniques for authentication. Chapter 10: Integrating GPT with the Enterprise - A study of the dynamic and complex world of integrating GPT within the enterprise environment. This is done through the lens of the fictional scale-up 10X Batteries. Chapter 11: Marketing and Sales with GPT - Explores some of the opportunities to apply GPT in business development. From generating initial ideas and drafts using GPT to refining content through human review
and structured processes, this chapter sketches out the socio-technical system that promises to accelerate growth at 10X Batteries. Chapter 12: Intellectual Property Management with GPT - Shows how 10X Batteries' IP function uses GPT as a force multiplier within the world of patents, trademarks, copyrights, and trade secrets. Chapter 13: GPT in Manufacturing - Looks at the role of GPT and Python in taking Smart Factory concepts from theory to practice. Shows how 10X Batteries is driving efficiencies and innovations in its manufacturing function. Chapter 14: Scaling up - Takes a high-level look at deploying GPT models efficiently and reliably. Overviews of the art of effectively debugging, optimizing, and maintaining GPT-powered solutions. Chapter 15: Emerging Trends and Future Directions - Connects the recipes and knowledge from throughout the rest of the book to the broader landscape in which Python and GPT sit. Shines a light on emerging trends and topics for further consideration. Establishes a mental model for thinking about AI now and into the future.
Code Bundle and Coloured Images Please follow the link to download the Code Bundle and the Coloured Images of the book: https://rebrand.ly/e93c66 The code bundle for the book is also hosted on GitHub at https://github.com/bpbpublications/Python-GPT-Cookbook. In case there’s an update to the code, it will be updated on the existing GitHub repository. We have code bundles from our rich catalogue of books and videos available at https://github.com/bpbpublications. Check them out! Errata We take immense pride in our work at BPB Publications and follow best practices to ensure the accuracy of our content to provide with an indulging reading experience to our subscribers. Our readers are our mirrors, and we use their inputs to reflect and improve upon human errors, if any, that may have occurred during the publishing processes involved. To let us maintain the quality and help us reach out to any readers who might be having difficulties due to any unforeseen errors, please write to us at : errata@bpbonline.com Your support, suggestions and feedbacks are highly appreciated by the BPB Publications’ Family. Did you know that BPB offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.bpbonline.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at : business@bpbonline.com for more details.
At www.bpbonline.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on BPB books and eBooks. Piracy If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at business@bpbonline.com with a link to the material. If you are interested in becoming an author If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit www.bpbonline.com. We have worked with thousands of developers and tech professionals, just like you, to help them share their insights with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea. Reviews Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions. We at BPB can understand what you think about our products, and our authors can see your feedback on their book. Thank you! For more information about BPB, please visit www.bpbonline.com. Join our book’s Discord space Join the book’s Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors: https://discord.bpbonline.com
Table of Contents 1. Introduction to GPT Introduction Structure Objectives GPT's family tree Data science Natural language processing OpenAI and GPT Evolution of GPT models OpenAI’s API Service OpenAI keys Python integration ChatGPT Recipes 1. Creating an OpenAI account 2. OpenAI Playgrounds 3. Managing API keys 4. Adding members Part 1, Adding an owner Part 2, Inviting readers Part 3, Service account Wrapping up Conclusion
Points to remember Further reading Exercises 2. Crafting Your GPT Workspace Introduction Structure Objectives Python with Visual Studio Code Installing Visual Studio Code Setting up a Python environment Key features Jupyter Notebooks Getting started with Jupyter in VS Code Key features Integration with other tools Jupytext: Bridging notebooks and text files Why use Jupytext Using Jupytext in VS Code NLP toolbox Alternative developer environments Recipes 5. Using iPython Secrets 6. Hello NLP Toolbox Part 1, OpenAI’s Python Client Library Part 2, spaCy Part 3, Hugging Face Transformers (GPT-2) 7. Setting up and using GitHub Codespaces for NLP projects 8. Using Azure ML Notebook Conclusion
Points to remember Exercises 3. Pre-processing Introduction Structure Objectives Tokenization Lowercasing Removing punctuation and special characters Removing stop words Stemming and lemmatization Stemming and lemmatization in the context of GPT Padding and truncation Encoding Handling missing values Missing values in text data Handling missing values with GPT Recipes 9. Byte-Pair Encoding principles Use cases Requirements Step-by-step implementation Output with explanation 10. Encoding and decoding with SentencePiece Requirements Step-by-step implementation Summing up 11. Tokenizing with GPT and Hugging Face Requirements
Step-by-step implementation Example output Summing up 12. Removing stop words with NLTK Use cases Requirements Step-by-step implementation Summing up 13. String translation with Python Use cases Requirements Step-by-step implementation Summing up 14. Stemming and lemmatization with NLTK Use cases Requirements Step-by-step implementation Summing up 15. Standard library padding and truncation 16. Padding and truncation in practice with GPT 17. Encoding in practice with GPT Step-by-step implementation Summing up 18. How to count tokens with tiktoken Step-by-step implementation Summing up 19. Imputing missing words with GPT Conclusion Points to remember Further reading Exercises
4. Embeddings Introduction Structure Objectives Background of embeddings Exploring the utility of embeddings Working of embeddings OpenAI API and embeddings Applications Types of embeddings Word embeddings Item embeddings Graph embeddings Custom embeddings Mathematical foundations Vector spaces Distance metrics Dimensionality reduction Pre-trained embeddings Vocabulary management Applications and use cases Text similarity and clustering Recommendation systems Sentiment analysis Translation Visualizing embeddings Recipes 20. Loading pre-trained word embeddings 21. Text pre-processing for embeddings Explanation
Use cases Requirements Pre-processing with NLTK Pre-processing with SpaCy Summing up 22. Using OpenAI’s models with text input Explanation of the code 23. Calculating the similarity between embeddings Embedding similarity using NumPy Embedding similarity using SciPy 24. Visualizing embeddings using t-SNE 2D visualisation 3D visualisation 25. Applying embeddings for text classification Step 1, pre-processing Step 2, embeddings Step 3, building a classification model Step 4, testing Summing up 26. Handling out-of-vocabulary words Example 1 Example 2 Summing up Conclusion Points to remember Exercises Further reading 5. Classifying Intent Introduction Structure
Objectives Overview Evaluation metrics Why metrics matter Key metrics and confusion matrices Real-world considerations Datasets Importance of high-quality datasets Generally available datasets Challenges in dataset preparation Practical techniques and tools Role of datasets in the development pipeline Preparing for the recipes Techniques Feature-based machine learning How feature-based models work Summary Deep learning and fine-tuned LLMs Zero-shot and few-shot learning Summing up Recipes 27. Explore the CLINC150 dataset Get the data Draw a word cloud Analyze topics Check for duplicate phrases Summing up 28. Zero-shot and few-shot learning Zero-shot with gpt-4o-mini One-shot with gpt-4o-mini Few-shot with gpt-4o-mini
Summing up 29. DistilBERT finely tuned Get tokenizer and model Quick look Test a balanced sample Plot a confusion matrix Inspect the confusion Summing up 30. SGDClassifier of intents Data preparation Pipeline setup Training the model Classifying sample inputs Confusion matrix Pickle the pipeline 31. MLflow Get samples Load the model Run an experiment Explore the GUI Conclusion Points to remember Exercises Further reading 6. Hugging Face and GPT Introduction Structure Objectives Hugging Face basics Transformers library
Comments 0
Loading comments...
Reply to Comment
Edit Comment