Previous Next

Pandas In 7 Days (Fabio Nelli) (z-library.sk, 1lib.sk, z-lib.sk)

Author: Fabio Nelli

后端

Make data analysis fast, reliable, and clean with Python, Pandas and Matplotlib. KEY FEATURES ● A detailed walk-through of the Pandas library's features with multiple examples. ● Numerous graphical representations and reporting capabilities using popular Matplotlib. ● A high-level overview of extracting data from including files, databases, and the web. DESCRIPTION No matter how large or small your dataset is, the author 'Fabio Nelli' simply used this book to teach all the finest technical coaching on applying Pandas to conduct data analysis with zero worries. Both newcomers and seasoned professionals will benefit from this book. It teaches you how to use the pandas library in just one week. Every day of the week, you'll learn and practise the features and data analysis exercises listed below: Day 01: Get familiar with the fundamental data structures of pandas, including Declaration, data upload, indexing, and so on. Day 02: Execute commands and operations related to data selection and extraction, including slicing, sorting, masking, iteration, and query execution. Day 03: Advanced commands and operations such as grouping, multi-indexing, reshaping, cross-tabulations, and aggregations. Day 04: Working with several data frames, including comparison, joins, concatenation, and merges. Day 05: Cleaning, pre-processing, and numerous strategies for data extraction from external files, the web, databases, and other data sources. Day 06: Working with missing data, interpolation, duplicate labels, boolean data types, text data, and time-series datasets. Day 07: Introduction to Jupyter Notebooks, interactive data analysis, and analytical reporting with Matplotlib's stunning graphics. WHAT YOU WILL LEARN ●Extract, cleanse, and process data from databases, text files, HTML pages, and JSON data. ●Work with DataFrames and Series, and apply functions to scale data manipulations. ●Graph your findings using charts typically used in modern business analytics. ●Learn to use all of the

📄 File Format: PDF
💾 File Size: 6.1 MB
10
Views
0
Downloads
0.00
Total Donations

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

📄 Page 1
(This page has no text content)
📄 Page 2
(This page has no text content)
📄 Page 3
Pandas in 7 Days Utilize Python to manipulate data, conduct scientific computing, time series analysis, and exploratory data analysis Fabio Nelli www.bpbonline.com
📄 Page 4
FIRST EDITION 2022 Copyright © BPB Publications, India ISBN: 978-93-5551-213-0 All Rights Reserved. No part of this publication may be reproduced, distributed or transmitted in any form or by any means or stored in a database or retrieval system, without the prior written permission of the publisher with the exception to the program listings which may be entered, stored and executed in a computer system, but they can not be reproduced by the means of publication, photocopy, recording, or by any electronic and mechanical means. LIMITS OF LIABILITY AND DISCLAIMER OF WARRANTY The information contained in this book is true to correct and the best of author’s and publisher’s knowledge. The author has made every effort to ensure the accuracy of these publications, but publisher cannot be held responsible for any loss or damage arising from any information in this book. All trademarks referred to in the book are acknowledged as properties of their respective owners but BPB Publications cannot guarantee the accuracy of this information. www.bpbonline.com
📄 Page 5
Dedicated to All those who are constantly looking for awareness & in a special way to My wife and my son
📄 Page 6
About the Author Fabio Nelli has a Master’s degree in Chemistry and a Bachelor’s degree in IT & Automation Engineering. He is currently working professionally at many research institutes and private companies, presenting educational courses about data analysis and data visualization technologies. He also completes his activity by writing articles on the web (in particular on his website, meccanismocomplesso.org) and in-depth books on the subject.
📄 Page 7
About the Reviewers Purna Chander Kathula is a Data Manager and a Python Developer who guides teams working on complex Data Engineering tasks, building data pipelines, easy handling of heavy lifting jobs, legacy data ingestions, Validations, Data Analysis, and Visualizations. His expertise lies in building tools and frameworks that validates structured, semi structured, and unstructured data. He is Coursera-certified in “Applied Data Science with Python” from the University of Michigan. He has authored two books, “Instant Sikuli Test Automation” and “Hands-on Data Analysis and visualization with Pandas”. He lives with his family and two children, and loves watching fiction movies and spending time with family and friends in his free time. Pragnesh Prajapati is a Data Scientist with extensive experience in building AI products. He is currently working as a Senior Data Scientist at Cerebulb (India) Private Limited to develop predictive analytics products for industries like sugar, cement, and mining. He is also working as an Mlops Solution architect in a freelance capacity to develop Mlops solution for big data architecture. He completed his B.E graduation in Electronics and Communication where he was exposed to IOT devices. He has taken 12+ online courses related to ML, Python, Deep learning, MLOPS, and AWS. He started his professional journey as an Embedded Consultant in Amnex Info Technologies, primarily working on IOT devices, and later worked as a machine learning consultant in the R&D Department. He has worked on IOT, Machine Learning, and Deep Learning products. He started mentoring and tutoring students in 2021, and worked with an Edu-tech company as a data science tutor. In his free time, he likes to learn new technologies (for instance, quantum machine learning), listen to podcasts, and play guitar.
📄 Page 8
Acknowledgements My gratitude also goes to the team at BPB Publications for being supportive enough to provide time to finish the first part of the book and also allow me to publish the book in multiple parts. Because image processing is a vast and active area of research, it was impossible to deep-dive into different classes of problems in a single book, especially if I am attempting to not make it too voluminous.
📄 Page 9
Preface This book will cover the topics of data analysis with the Pandas library in a simple and understandable way. The concepts will be gradually treated with many practical examples and explanatory images in order to facilitate and speed up learning. The book will be structured in two parts. The first part will deal with illustrating all the basic concepts of the Pandas library, with basic data structures, examples of methods, and other data manipulations typical of data analysis. The second part will show the application part of the Pandas library in the real world of data, that is, those available in the external world. We will learn how to find data, and how to acquire it from normal external sources (data sources). Many problems produced by real data will be presented and methods on how to solve them will be illustrated. Finally, we will briefly show how to view our results through graphs and other forms of visualization, and how to save or document them by generating reports. In parallel, the book can also be seen in its separation into 10 chapters. The first two chapters serve to introduce the topic of data analysis and illustrate the characteristics that make the Pandas library a valid tool in this regard. The following 7 chapters correspond to the 7-day course that will be used to learn Pandas in all its features and applications, through a whole series of examples. All topics will be reviewed individually, each with its own examples. The last chapter, on the other hand, will direct the reader on possible paths to take once the book is completed and continue with their training as a data scientist or other professional applications. The details of the 10 chapters are listed as follows: Chapter 1 - Pandas, the Python library, gives a brief overview of the Pandas library and data analysis. Chapter 2 - Setting up a Data Analysis Environment, explains how to install Pandas on various platforms and the different ways of working (workspaces). Chapter 3 - Day 1 - Data Structures in Pandas library, provides an introduction to the basic data structures of the Pandas library, followed by
📄 Page 10
the declaration, data upload, and other related concepts. Chapter 4 - Day 2 - Working within a DataFrame, Basic Functionalities, discusses the basic commands and operations to be performed on Series and DataFrames, such as the selection and extraction of some data and the evaluation of statistics. Chapter 5 - Day 3 - Working within a DataFrame, Advanced Functionalities, explains the advanced commands and operations for data analysis. Selections by grouping, multiindexing, and other more elaborate techniques are explained in this chapter. Chapter 6 - Day 4 - Working with two or more DataFrames, discusses the Commands and operations involving multiple DataFrames. Comparison, joins, Concatenation, and other operations are also discussed in detail. Chapter 7 - Day 5 - Working with data sources and real-word datasets, includes different data formats available in input and output – real world data, their importance, their characteristics, and the need to process them. Data sources such as databases and web (HTML and data sources available online). Cleaning and processing of data before analysis. Some practical examples on research, pre-processing, data cleaning, and then analysis of the real data acquired. Chapter 8 - Day 6 - Troubleshooting Challenges with Real Datasets, explains that the real data, that is acquired from external sources, are never ready to be processed. They must first be preprocessed (cleaned) and then the data formats must be standardized with each other. Despite these preliminary operations (pre-processing), the loaded data can present significant problems such as empty fields, incorrect data, or difficult to work with as textual values. Chapter 9 - Day 7 - Data Visualization and Reporting teaches us that once we have analyzed the data and obtained our results, they will need to be visualized in some way. Python with the Matplotlib library offers remarkable visualization tools. The chapter includes discovering Jupyter Notebooks, interactive data analysis, and reporting. Chapter 10 Conclusion - Moving Beyond, provides a brief conclusion on the possibilities that open up once you become familiar with the Pandas library. The Data Scientist profession today, applications, and other Python libraries to use are discussed in this chapter.
📄 Page 11
Code Bundle and Coloured Images Please follow the link to download the Code Bundle and the Coloured Images of the book: https://rebrand.ly/b34fcf The code bundle for the book is also hosted on GitHub at https://github.com/bpbpublications/Pandas-in-7-Days. In case there's an update to the code, it will be updated on the existing GitHub repository. We have code bundles from our rich catalogue of books and videos available at https://github.com/bpbpublications. Check them out! Errata We take immense pride in our work at BPB Publications and follow best practices to ensure the accuracy of our content to provide with an indulging reading experience to our subscribers. Our readers are our mirrors, and we use their inputs to reflect and improve upon human errors, if any, that may have occurred during the publishing processes involved. To let us maintain the quality and help us reach out to any readers who might be having difficulties due to any unforeseen errors, please write to us at : errata@bpbonline.com Your support, suggestions and feedbacks are highly appreciated by the BPB Publications’ Family. Did you know that BPB offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.bpbonline.com and as a print book
📄 Page 12
customer, you are entitled to a discount on the eBook copy. Get in touch with us at: business@bpbonline.com for more details. At www.bpbonline.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on BPB books and eBooks.
📄 Page 13
Piracy If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at business@bpbonline.com with a link to the material. If you are interested in becoming an author If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit www.bpbonline.com. We have worked with thousands of developers and tech professionals, just like you, to help them share their insights with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea. Reviews Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions. We at BPB can understand what you think about our products, and our authors can see your feedback on their book. Thank you! For more information about BPB, please visit www.bpbonline.com.
📄 Page 14
Table of Contents 1. Pandas, the Python Library Structure Objective A bit of history Why use Pandas (and Python) for data analysis? A trust gained over the years A flexible language that adapts to any context Automation, reproducibility, and interaction Data Analysis What is data analysis? The data scientist The analysis process Tabular form of data Tabular form of data Spreadsheets SQL tables from databases Pandas and Dataframes Conclusion References 2. Setting up a Data Analysis Environment Structure Objective The basic procedure Installing Python Python 3.x vs Python 2.7 Installing Pandas and the other libraries Installing Jupyter Notebooks Considerations Anaconda The Anaconda distribution Installing Anaconda
📄 Page 15
Installing Anaconda on Linux systems Conda and the Anaconda command line shell Creating and managing environments with Conda Conda – Create an environment with a different Python version Conda – Export the environments Creating and managing packages (and libraries) with Conda Jupyter Notebook Creating a Notebook on Jupyter First steps on a new Jupyter Notebook Working online on development environments: Replit Replit Creating a Python workspace for data analysis on Replit Conclusion Questions Multiple Choice Questions Answers Key Terms References/Further Readings 3. Day 1 - Data Structures in Pandas library Structure Objective Introduction to the Series A Series as a sequence of data The Index() constructor The name attribute of Series Various ways to build Series The Series() constructor Creating a Series from a NumPy array Creating a Series from a scalar value Creating a Series from a dictionary Creating a Series from a list of tuples Creating a Series from a 2-element array list Introduction to DataFrames The DataFrame as an indexed data table The DataFrame() constructor Viewing DataFrames in Jupyter Notebook
📄 Page 16
Other constructor arguments Access the various elements of the DataFrame with iloc[ ] and loc[ ] Various ways to build DataFrames Creating a DataFrame from a Numpy ndarray Creating a DataFrame from a Series dictionary Creating a DataFrame from a ndarray dictionary Pass-by-copy with Dictionaries Creating a DataFrame from a structured array or a record array Creating a DataFrame from a list of dictionaries Creating a DataFrame from a list of Data Classes Creating a DataFrame from a list of Named Tuples MultiIndex DataFrame Creating a MultiIndex DataFrame from a dictionary of tuples Conclusion Questions Multiple Choice Questions Answers Key Terms References 4. Day 2 - Working within a DataFrame, Basic Functionalities Structure Objective Viewing Data Direct printing of the DataFrame values The head() and tail() methods Selection Selection by Subsetting Subsetting by index or by position Scalars and Series as return values Indexing Operators and Indexers loc[ ] – Selection by Labels iloc[ ] – Selection by location Indexer ix[ ] at[] and iat[] – Selection by value Selection on dtype Filtering
📄 Page 17
The Boolean condition Filtering on the lines Filtering on columns Application of multiple conditions Boolean reductions Filtering with isin() Editing Adding, inserting, and deleting a column in a DataFrame Adding, inserting, and deleting a row in a DataFrame Adding new columns with assign() Descriptive Statistics The describe() method Calculation of individual statistics The Skipna option The standardization of data Transposition, Sorting, and Reindexing Transposition Sorting Sorting by label Sorting by values Reindexing Reindexing using another DataFrame as a reference Conclusion Questions MCQs Solution: Key Terms References/Further Readings 5. Day 3 - Working within a DataFrame, Advanced Functionalities Structure Objective Shifting The shift() method Reshape The stack() and unstack() methods Pivoting
📄 Page 18
Iteration Iteration with for loops Methods of iteration Row iterations in DataFrames Application of Functions on a Dataframe The levels of operability of the functions Applying NumPy functions directly on DataFrame Applying functions with the apply () method Applications of functions with the pipe() method Chaining pipe() and apply() methods Applying functions with arguments with the pipe() method The applymap() and map() methods Transforming The transform() method Reducing a MultiIndex DataFrame to a DataFrame Transform() does not aggregate elements Transform() works with a single column at a time Aggregation The agg() method Grouping The groupby() method The groupby() method for indexes Grouping with Transformation Grouping with Aggregation Apply() with groupby() Categorization Specifying a category column Conclusion Questions MCQs Solution: Key Terms References/Further Readings 6. Day 4 - Working with Two or More DataFrames Structure Objective
📄 Page 19
Add data to a DataFrame The append() method Appending multiple Series The concat() function Concatenations between DataFrame and Series that are not homogeneous Multiple concatenations Joining between two DataFrames The merge() method The LEFT Join The RIGHT Join The OUTER Join The INNER Join Arithmetic with DataFrames Two homogeneous DataFrames as operands Two non-homogeneous DataFrames as operands Operations between a DataFrame and a Series Flexible binary arithmetic methods Flexible binary methods and arithmetic operators Differences between arithmetic operators and flexible binary methods Arithmetic operations between subsets of the same DataFrame Arithmetic operations between subsets of different DataFrames The relative binary methods Boolean operations Boolean operations between two DataFrames Flexible Boolean binary methods Aligning on DataFrames The align() method Aligning with the Join modalities Conclusion Questions Multiple choice questions Solution Key Terms References/Further Readings
📄 Page 20
7. Day 5 - Working with Data Sources and Real-World Datasets Structure Objective Files as external data sources Read and write data from files in CSV format Graphical display of DataFrames in Jupyter Notebook Preview of the content being written to file Some optional to_csv() parameters Read and Write Microsoft Excel File Read data from HTML pages on the web Read and write data from XML files Read and Write data in JSON Write and read data in binary format Final thoughts on saving data to file Interact with databases for data exchange SQLite installation Creation of test data on SQLite Other data sources Working with Time Data Date Times Time Deltas Create arrays of time values Import Date Time values from file Working with Text Data The methods of correcting the characters of the string Methods for splitting strings Methods for concatenating columns of text Separate alphanumeric from numeric characters in a column Adding characters to the rows of a column Recognition methods between alphanumeric and numeric characters Searching methods in strings Conclusion Questions Multiple choice questions Solution Key Terms References/Further Readings
The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00
Total Amount (¥)
0
Donation Count

Login to support the author

Login Now

Recommended for You

Loading recommended books...
Failed to load, please try again later
Back to List