Apr 2, 2023
A Pandas Library Introduction in Python.
Apr 3, 2023 12:50 AM
👋 What's shaking, bacon? In this new lesson I will show you a simple introduction to Pandas, a Python library to manipulate datasets in a variaty formats, such as CSV and XLSX.
First things first, let's install it by running the following command on your Command Prompt:
Using PIP
pip install pandas
Using Conda
conda install pandas
Okay, now that we already have Pandas installed, let's get going to the code!!
PS.: you can download the Jupyter Notebook and the dataset files from this lesson in this GitHub Repository pandas-basics.


0) Reading Dataset

To read datasets, we use the read_csv function, and before you ask me about the parameters, I will not cover all of them here, because there a bunch of them - even though you are interested about them, you can check it out on Pandas Read_CSV Documentation.
# ---- Reading CSV Dataset ----
import pandas as pd

filepath = "./datasets/jojo-stands.csv"
df = pd.read_csv(filepath)
Don't worry, I planned this error!!!
I wanna say that 80% of the datasets you'll be working in the future will be on charset UTF-8. However - especially if you live in a country where this charset is not the default, such as Japan - you will get this same error I got hhere: there are characters that cannot be identifyed as UTF-8.
To solve this, we will be using chardet library. This library reads a fragment of the dataset and guesses which charset is in it. After that, you can try to read the dataset again with pandas assigning the properly charset. To install chardet, you run the following command on your command prompt:
Using PIP
pip install chardet
Using Conda
conda install chardet
With the library already installed, let's find out what charset is the dataset in.
# ---- Figuring Out Dataset Charset with Chardet ----
import chardet 

# Reading the first 100,000 bytes to guess the charset
with open(filepath, 'rb') as file:
    guessed_chardet = chardet.detect(file.read(100000))

{'encoding': 'ISO-8859-1', 'confidence': 0.73, 'language': ''}
Hmm, so there is a 73% chance to the charset be ISO-8859-1. So let's try to read the dataset with this carset. In case we got the same error again, we use chardet again, but reading the first 200,000 bytes.
df = pd.read_csv(filepath, encoding='ISO-8859-1')

# If everything goes well, let's print the first 5 rows
Yeaaay, we got it!! Now, to finish this first part, let's count how many rows and columns thee dataset contains, list the columns names and display a basic statistical overview.
print(f"Number of Rows: {df.shape[0]}")
print(f"Number of Columns: {df.shape[1]} ({list(df.columns)})")

Number of Rows: 156
Number of Columns: 7 (['Stand', 'PWR', 'SPD', 'RNG', 'PER', 'PRC', 'DEV'])
1) Operations

Operations-wise, we will cover the five main ones: renamingselectingupdatininserting and deleting/dropping.

1.1) Renaming

Renaming refers to rename the columns name. In this part, let's rename the columns to follows:
New Features
PWR >> Power
SPD >> Speed
RNG >> Range
PER >> Stamina
PRC >> Precision
DEV >> Development_Potencial
# ---- Renaming Columns ----
new_names = {
    'PWR'    :  'Power'
    , 'SPD'  :  'Speed'
    , 'RNG'  :  'Range'
    , 'PER'  :  'Stamina'
    , 'PRC'  :  'Precision'
    , 'DEV'  :  'Development_Potencial'

df.rename(columns=new_names, inplace=True)
1.2) Selecting

Now, let's select some columns and rows of our Data Frame. There are several ways to do it, so I will be showing just the most used ones here.
# ---- Selecting a Single Column ----
# ---- Seleting Multiple Columns ----
df[['Power', 'Speed', 'Development_Potencial']].head()
# ---- Selecting a Single Row ----
# ---- Selecting Multiple Rows ----
# ---- Selecting Rows with iloc ----
# ---- Selecting Rows with Conditions ----
# - selecting stands with Power and Development_Potencial stats equals to 'A'
df.loc[(df['Power'] == 'A') & (df['Development_Potencial'] == 'A')]
1.3) Updating

Updating is the action to change row values. In this example, let's apply an Encoding to the values, that is, convert the strings to numbers:
None  >>    0
E     >>    1
D     >>    2
C     >>    3
B     >>    4
A     >>    5
Infi  >>  999
# ---- Updating Values ----
df.fillna(0, inplace=True)
df.replace('None', 0, inplace=True)
df.replace('E', 1, inplace=True)
df.replace('D', 2, inplace=True)
df.replace('C', 3, inplace=True)
df.replace('B', 4, inplace=True)
df.replace('A', 5, inplace=True)
df.replace('Infi', 999, inplace=True)

1.4) Deleting / Dropping

Now, let's delete / drop the first rows - adiós Anubis Stand! Oh, delete and drop means the same thing, both terms are interchangeable.
# ---- Deleting / Dropping Rows ----
anubis_stand = df.iloc[0]
df.drop(0, inplace=True)

print(f'Anubis Stand: {anubis_stand}')
Anubis Stand: Stand                    Anubis
Power                         4
Speed                         4
Range                         1
Stamina                       5
Precision                     1
Development_Potencial         3
Name: 0, dtype: object
1.5) Inserting

Hmmm, I like Anubis Stand, so let's add it again!!
# ---- Inserting Rows ----
# - adding to the end
df.loc[len(df.index)] = anubis_stand
Well, it was kinda a large lesson we got today, wasn't it? But you gotta agree with me, this lesson was amazing!
See you in the next post!! 👋