🐼 Pandas Basics: Introduction
date
Apr 2, 2023
slug
pandas-basics-introduction
status
Published
tags
Pandas Basics
summary
A Pandas Library Introduction in Python.
type
Post
Last updated
Apr 3, 2023 12:50 AM
👋 What's shaking, bacon? In this new lesson I will show you a simple introduction to
Pandas
, a Python library to manipulate datasets in a variaty formats, such as CSV
and XLSX
.First things first, let's install it by running the following command on your Command Prompt:
Using PIP
pip install pandas
Using Conda
conda install pandas
Okay, now that we already have Pandas installed, let's get going to the code!!
PS.: you can download the
Jupyter Notebook
and the dataset
files from this lesson in this GitHub Repository pandas-basics.0) Reading Dataset
To read datasets, we use the
read_csv
function, and before you ask me about the parameters, I will not cover all of them here, because there a bunch of them - even though you are interested about them, you can check it out on Pandas Read_CSV Documentation.#
# ---- Reading CSV Dataset ----
#
import pandas as pd
filepath = "./datasets/jojo-stands.csv"
df = pd.read_csv(filepath)
Don't worry, I planned this error!!!
I wanna say that 80% of the datasets you'll be working in the future will be on charset
UTF-8
. However - especially if you live in a country where this charset is not the default, such as Japan - you will get this same error I got hhere: there are characters that cannot be identifyed as UTF-8
.To solve this, we will be using
chardet
library. This library reads a fragment of the dataset and guesses which charset is in it. After that, you can try to read the dataset again with pandas assigning the properly charset. To install chardet
, you run the following command on your command prompt:Using PIP
pip install chardet
Using Conda
conda install chardet
With the library already installed, let's find out what charset is the dataset in.
#
# ---- Figuring Out Dataset Charset with Chardet ----
#
import chardet
# Reading the first 100,000 bytes to guess the charset
with open(filepath, 'rb') as file:
guessed_chardet = chardet.detect(file.read(100000))
print(guessed_chardet)
{'encoding': 'ISO-8859-1', 'confidence': 0.73, 'language': ''}
Hmm, so there is a 73% chance to the charset be
ISO-8859-1
. So let's try to read the dataset with this carset. In case we got the same error again, we use chardet again, but reading the first 200,000 bytes.df = pd.read_csv(filepath, encoding='ISO-8859-1')
# If everything goes well, let's print the first 5 rows
df.head()
Yeaaay, we got it!! Now, to finish this first part, let's count how many rows and columns thee dataset contains, list the columns names and display a basic statistical overview.
print(f"Number of Rows: {df.shape[0]}")
print(f"Number of Columns: {df.shape[1]} ({list(df.columns)})")
df.describe()
Number of Rows: 156
Number of Columns: 7 (['Stand', 'PWR', 'SPD', 'RNG', 'PER', 'PRC', 'DEV'])
1) Operations
Operations-wise, we will cover the five main ones:
renaming
, selecting
, updatin
, inserting
and deleting/dropping.
1.1) Renaming
Renaming
refers to rename the columns name. In this part, let's rename the columns to follows:New Features
PWR >> Power
SPD >> Speed
RNG >> Range
PER >> Stamina
PRC >> Precision
DEV >> Development_Potencial
#
# ---- Renaming Columns ----
#
new_names = {
'PWR' : 'Power'
, 'SPD' : 'Speed'
, 'RNG' : 'Range'
, 'PER' : 'Stamina'
, 'PRC' : 'Precision'
, 'DEV' : 'Development_Potencial'
}
df.rename(columns=new_names, inplace=True)
df.head()
1.2) Selecting
Now, let's
select
some columns and rows of our Data Frame. There are several ways to do it, so I will be showing just the most used ones here.#
# ---- Selecting a Single Column ----
#
df['Power'].head()
#
# ---- Seleting Multiple Columns ----
#
df[['Power', 'Speed', 'Development_Potencial']].head()
#
# ---- Selecting a Single Row ----
#
df[0:1]
#
# ---- Selecting Multiple Rows ----
#
df[0:10]
#
# ---- Selecting Rows with iloc ----
#
df.iloc[15:20]
#
# ---- Selecting Rows with Conditions ----
#
#
# - selecting stands with Power and Development_Potencial stats equals to 'A'
#
df.loc[(df['Power'] == 'A') & (df['Development_Potencial'] == 'A')]
1.3) Updating
Updating
is the action to change row values. In this example, let's apply an Encoding to the values, that is, convert the strings to numbers:Encoding
None >> 0
E >> 1
D >> 2
C >> 3
B >> 4
A >> 5
Infi >> 999
#
# ---- Updating Values ----
#
df.fillna(0, inplace=True)
df.replace('None', 0, inplace=True)
df.replace('E', 1, inplace=True)
df.replace('D', 2, inplace=True)
df.replace('C', 3, inplace=True)
df.replace('B', 4, inplace=True)
df.replace('A', 5, inplace=True)
df.replace('Infi', 999, inplace=True)
df.head()
1.4) Deleting / Dropping
Now, let's
delete / drop
the first rows - adiós Anubis Stand! Oh, delete
and drop
means the same thing, both terms are interchangeable.#
# ---- Deleting / Dropping Rows ----
#
anubis_stand = df.iloc[0]
df.drop(0, inplace=True)
print(f'Anubis Stand: {anubis_stand}')
df.head()
Anubis Stand: Stand Anubis
Power 4
Speed 4
Range 1
Stamina 5
Precision 1
Development_Potencial 3
Name: 0, dtype: object
1.5) Inserting
Hmmm, I like Anubis Stand, so let's add it again!!
#
# ---- Inserting Rows ----
#
# - adding to the end
#
df.loc[len(df.index)] = anubis_stand
df
Well, it was kinda a large lesson we got today, wasn't it? But you gotta agree with me, this lesson was amazing!
See you in the next post!! 👋