🐼 Pandas Basics: Introduction

date

Apr 2, 2023

slug

pandas-basics-introduction

status

Published

0) Reading Dataset

To read datasets, we use the read_csv function, and before you ask me about the parameters, I will not cover all of them here, because there a bunch of them - even though you are interested about them, you can check it out on Pandas Read_CSV Documentation.

#
# ---- Reading CSV Dataset ----
#
import pandas as pd

filepath = "./datasets/jojo-stands.csv"
df = pd.read_csv(filepath)

Don't worry, I planned this error!!!

I wanna say that 80% of the datasets you'll be working in the future will be on charset UTF-8. However - especially if you live in a country where this charset is not the default, such as Japan - you will get this same error I got hhere: there are characters that cannot be identifyed as UTF-8.

To solve this, we will be using chardet library. This library reads a fragment of the dataset and guesses which charset is in it. After that, you can try to read the dataset again with pandas assigning the properly charset. To install chardet, you run the following command on your command prompt:

Using PIP

pip install chardet

Using Conda

conda install chardet

With the library already installed, let's find out what charset is the dataset in.

#
# ---- Figuring Out Dataset Charset with Chardet ----
#
import chardet 

# Reading the first 100,000 bytes to guess the charset
with open(filepath, 'rb') as file:
    guessed_chardet = chardet.detect(file.read(100000))

print(guessed_chardet)

{'encoding': 'ISO-8859-1', 'confidence': 0.73, 'language': ''}

Hmm, so there is a 73% chance to the charset be ISO-8859-1. So let's try to read the dataset with this carset. In case we got the same error again, we use chardet again, but reading the first 200,000 bytes.

df = pd.read_csv(filepath, encoding='ISO-8859-1')

# If everything goes well, let's print the first 5 rows
df.head()

Yeaaay, we got it!! Now, to finish this first part, let's count how many rows and columns thee dataset contains, list the columns names and display a basic statistical overview.

print(f"Number of Rows: {df.shape[0]}")
print(f"Number of Columns: {df.shape[1]} ({list(df.columns)})")

df.describe()

Number of Rows: 156

Number of Columns: 7 (['Stand', 'PWR', 'SPD', 'RNG', 'PER', 'PRC', 'DEV'])

1) Operations

Operations-wise, we will cover the five main ones: renaming, selecting, updatin, inserting and deleting/dropping.

1.1) Renaming

Renaming refers to rename the columns name. In this part, let's rename the columns to follows:

New Features

PWR >> Power
SPD >> Speed
RNG >> Range
PER >> Stamina
PRC >> Precision
DEV >> Development_Potencial

#
# ---- Renaming Columns ----
#
new_names = {
    'PWR'    :  'Power'
    , 'SPD'  :  'Speed'
    , 'RNG'  :  'Range'
    , 'PER'  :  'Stamina'
    , 'PRC'  :  'Precision'
    , 'DEV'  :  'Development_Potencial'
}

df.rename(columns=new_names, inplace=True)
df.head()

1.2) Selecting

Now, let's select some columns and rows of our Data Frame. There are several ways to do it, so I will be showing just the most used ones here.

#
# ---- Selecting a Single Column ----
#
df['Power'].head()

#
# ---- Seleting Multiple Columns ----
#
df[['Power', 'Speed', 'Development_Potencial']].head()

#
# ---- Selecting a Single Row ----
#
df[0:1]

#
# ---- Selecting Multiple Rows ----
#
df[0:10]

#
# ---- Selecting Rows with iloc ----
#
df.iloc[15:20]

#
# ---- Selecting Rows with Conditions ----
#
#
# - selecting stands with Power and Development_Potencial stats equals to 'A'
#
df.loc[(df['Power'] == 'A') & (df['Development_Potencial'] == 'A')]

1.3) Updating

Updating is the action to change row values. In this example, let's apply an Encoding to the values, that is, convert the strings to numbers:

Encoding

None  >>    0
E     >>    1
D     >>    2
C     >>    3
B     >>    4
A     >>    5
Infi  >>  999

#
# ---- Updating Values ----
#
df.fillna(0, inplace=True)
df.replace('None', 0, inplace=True)
df.replace('E', 1, inplace=True)
df.replace('D', 2, inplace=True)
df.replace('C', 3, inplace=True)
df.replace('B', 4, inplace=True)
df.replace('A', 5, inplace=True)
df.replace('Infi', 999, inplace=True)

df.head()

1.4) Deleting / Dropping

Now, let's delete / drop the first rows - adiós Anubis Stand! Oh, delete and drop means the same thing, both terms are interchangeable.

#
# ---- Deleting / Dropping Rows ----
#
anubis_stand = df.iloc[0]
df.drop(0, inplace=True)

print(f'Anubis Stand: {anubis_stand}')
df.head()

Anubis Stand: Stand                    Anubis
Power                         4
Speed                         4
Range                         1
Stamina                       5
Precision                     1
Development_Potencial         3
Name: 0, dtype: object

1.5) Inserting

Hmmm, I like Anubis Stand, so let's add it again!!

#
# ---- Inserting Rows ----
#
# - adding to the end
#
df.loc[len(df.index)] = anubis_stand
df

Well, it was kinda a large lesson we got today, wasn't it? But you gotta agree with me, this lesson was amazing!

See you in the next post!! 👋