🐼 Pandas Basics: Introduction

date
Apr 2, 2023
slug
pandas-basics-introduction
status
Published
tags
Pandas Basics
summary
A Pandas Library Introduction in Python.
type
Post
Last updated
Apr 3, 2023 12:50 AM
👋 What's shaking, bacon? In this new lesson I will show you a simple introduction to Pandas, a Python library to manipulate datasets in a variaty formats, such as CSV and XLSX.
 
First things first, let's install it by running the following command on your Command Prompt:
 
Using PIP
pip install pandas
Using Conda
conda install pandas
 
Okay, now that we already have Pandas installed, let's get going to the code!!
PS.: you can download the Jupyter Notebook and the dataset files from this lesson in this GitHub Repository pandas-basics.
 

 

0) Reading Dataset

 
To read datasets, we use the read_csv function, and before you ask me about the parameters, I will not cover all of them here, because there a bunch of them - even though you are interested about them, you can check it out on Pandas Read_CSV Documentation.
 
#
# ---- Reading CSV Dataset ----
#
import pandas as pd

filepath = "./datasets/jojo-stands.csv"
df = pd.read_csv(filepath)
notion image
 
notion image
 
Don't worry, I planned this error!!!
I wanna say that 80% of the datasets you'll be working in the future will be on charset UTF-8. However - especially if you live in a country where this charset is not the default, such as Japan - you will get this same error I got hhere: there are characters that cannot be identifyed as UTF-8.
 
To solve this, we will be using chardet library. This library reads a fragment of the dataset and guesses which charset is in it. After that, you can try to read the dataset again with pandas assigning the properly charset. To install chardet, you run the following command on your command prompt:
 
Using PIP
pip install chardet
Using Conda
conda install chardet
 
With the library already installed, let's find out what charset is the dataset in.
 
#
# ---- Figuring Out Dataset Charset with Chardet ----
#
import chardet 

# Reading the first 100,000 bytes to guess the charset
with open(filepath, 'rb') as file:
    guessed_chardet = chardet.detect(file.read(100000))

print(guessed_chardet)
{'encoding': 'ISO-8859-1', 'confidence': 0.73, 'language': ''}
 
Hmm, so there is a 73% chance to the charset be ISO-8859-1. So let's try to read the dataset with this carset. In case we got the same error again, we use chardet again, but reading the first 200,000 bytes.
 
df = pd.read_csv(filepath, encoding='ISO-8859-1')

# If everything goes well, let's print the first 5 rows
df.head()
notion image
 
Yeaaay, we got it!! Now, to finish this first part, let's count how many rows and columns thee dataset contains, list the columns names and display a basic statistical overview.
 
print(f"Number of Rows: {df.shape[0]}")
print(f"Number of Columns: {df.shape[1]} ({list(df.columns)})")

df.describe()
Number of Rows: 156
Number of Columns: 7 (['Stand', 'PWR', 'SPD', 'RNG', 'PER', 'PRC', 'DEV'])
notion image
 

 

1) Operations

 
Operations-wise, we will cover the five main ones: renamingselectingupdatininserting and deleting/dropping.
 

1.1) Renaming

 
Renaming refers to rename the columns name. In this part, let's rename the columns to follows:
 
New Features
PWR >> Power
SPD >> Speed
RNG >> Range
PER >> Stamina
PRC >> Precision
DEV >> Development_Potencial
 
 
#
# ---- Renaming Columns ----
#
new_names = {
    'PWR'    :  'Power'
    , 'SPD'  :  'Speed'
    , 'RNG'  :  'Range'
    , 'PER'  :  'Stamina'
    , 'PRC'  :  'Precision'
    , 'DEV'  :  'Development_Potencial'
}

df.rename(columns=new_names, inplace=True)
df.head()
notion image
 

1.2) Selecting

 
Now, let's select some columns and rows of our Data Frame. There are several ways to do it, so I will be showing just the most used ones here.
 
#
# ---- Selecting a Single Column ----
#
df['Power'].head()
notion image
 
#
# ---- Seleting Multiple Columns ----
#
df[['Power', 'Speed', 'Development_Potencial']].head()
notion image
 
#
# ---- Selecting a Single Row ----
#
df[0:1]
notion image
 
#
# ---- Selecting Multiple Rows ----
#
df[0:10]
notion image
 
#
# ---- Selecting Rows with iloc ----
#
df.iloc[15:20]
notion image
 
#
# ---- Selecting Rows with Conditions ----
#
#
# - selecting stands with Power and Development_Potencial stats equals to 'A'
#
df.loc[(df['Power'] == 'A') & (df['Development_Potencial'] == 'A')]
notion image
 
 

1.3) Updating

 
Updating is the action to change row values. In this example, let's apply an Encoding to the values, that is, convert the strings to numbers:
 
Encoding
None  >>    0
E     >>    1
D     >>    2
C     >>    3
B     >>    4
A     >>    5
Infi  >>  999
 
#
# ---- Updating Values ----
#
df.fillna(0, inplace=True)
df.replace('None', 0, inplace=True)
df.replace('E', 1, inplace=True)
df.replace('D', 2, inplace=True)
df.replace('C', 3, inplace=True)
df.replace('B', 4, inplace=True)
df.replace('A', 5, inplace=True)
df.replace('Infi', 999, inplace=True)

df.head()
notion image
 
 

1.4) Deleting / Dropping

 
Now, let's delete / drop the first rows - adiós Anubis Stand! Oh, delete and drop means the same thing, both terms are interchangeable.
 
#
# ---- Deleting / Dropping Rows ----
#
anubis_stand = df.iloc[0]
df.drop(0, inplace=True)

print(f'Anubis Stand: {anubis_stand}')
df.head()
Anubis Stand: Stand                    Anubis
Power                         4
Speed                         4
Range                         1
Stamina                       5
Precision                     1
Development_Potencial         3
Name: 0, dtype: object
notion image
 
 

1.5) Inserting

 
Hmmm, I like Anubis Stand, so let's add it again!!
notion image
 
#
# ---- Inserting Rows ----
#
# - adding to the end
#
df.loc[len(df.index)] = anubis_stand
df
notion image
 

 
 

Well, it was kinda a large lesson we got today, wasn't it? But you gotta agree with me, this lesson was amazing!
 
See you in the next post!! 👋