🐼 Pandas Basics: Introduction
date
Apr 2, 2023
slug
pandas-basics-introduction
status
Published
tags
Pandas Basics
summary
A Pandas Library Introduction in Python.
type
Post
Last updated
Apr 3, 2023 12:50 AM
👋 What's shaking, bacon? In this new lesson I will show you a simple introduction to
Pandas, a Python library to manipulate datasets in a variaty formats, such as CSV and XLSX.First things first, let's install it by running the following command on your Command Prompt:
Using PIP
pip install pandasUsing Conda
conda install pandasOkay, now that we already have Pandas installed, let's get going to the code!!
PS.: you can download the
Jupyter Notebook and the dataset files from this lesson in this GitHub Repository pandas-basics.0) Reading Dataset
To read datasets, we use the
read_csv function, and before you ask me about the parameters, I will not cover all of them here, because there a bunch of them - even though you are interested about them, you can check it out on Pandas Read_CSV Documentation.#
# ---- Reading CSV Dataset ----
#
import pandas as pd
filepath = "./datasets/jojo-stands.csv"
df = pd.read_csv(filepath)

Don't worry, I planned this error!!!
I wanna say that 80% of the datasets you'll be working in the future will be on charset
UTF-8. However - especially if you live in a country where this charset is not the default, such as Japan - you will get this same error I got hhere: there are characters that cannot be identifyed as UTF-8.To solve this, we will be using
chardet library. This library reads a fragment of the dataset and guesses which charset is in it. After that, you can try to read the dataset again with pandas assigning the properly charset. To install chardet, you run the following command on your command prompt:Using PIP
pip install chardetUsing Conda
conda install chardetWith the library already installed, let's find out what charset is the dataset in.
#
# ---- Figuring Out Dataset Charset with Chardet ----
#
import chardet
# Reading the first 100,000 bytes to guess the charset
with open(filepath, 'rb') as file:
guessed_chardet = chardet.detect(file.read(100000))
print(guessed_chardet){'encoding': 'ISO-8859-1', 'confidence': 0.73, 'language': ''}Hmm, so there is a 73% chance to the charset be
ISO-8859-1. So let's try to read the dataset with this carset. In case we got the same error again, we use chardet again, but reading the first 200,000 bytes.df = pd.read_csv(filepath, encoding='ISO-8859-1')
# If everything goes well, let's print the first 5 rows
df.head()
Yeaaay, we got it!! Now, to finish this first part, let's count how many rows and columns thee dataset contains, list the columns names and display a basic statistical overview.
print(f"Number of Rows: {df.shape[0]}")
print(f"Number of Columns: {df.shape[1]} ({list(df.columns)})")
df.describe()Number of Rows: 156Number of Columns: 7 (['Stand', 'PWR', 'SPD', 'RNG', 'PER', 'PRC', 'DEV'])
1) Operations
Operations-wise, we will cover the five main ones:
renaming, selecting, updatin, inserting and deleting/dropping.1.1) Renaming
Renaming refers to rename the columns name. In this part, let's rename the columns to follows:New Features
PWR >> Power
SPD >> Speed
RNG >> Range
PER >> Stamina
PRC >> Precision
DEV >> Development_Potencial#
# ---- Renaming Columns ----
#
new_names = {
'PWR' : 'Power'
, 'SPD' : 'Speed'
, 'RNG' : 'Range'
, 'PER' : 'Stamina'
, 'PRC' : 'Precision'
, 'DEV' : 'Development_Potencial'
}
df.rename(columns=new_names, inplace=True)
df.head()
1.2) Selecting
Now, let's
select some columns and rows of our Data Frame. There are several ways to do it, so I will be showing just the most used ones here.#
# ---- Selecting a Single Column ----
#
df['Power'].head()
#
# ---- Seleting Multiple Columns ----
#
df[['Power', 'Speed', 'Development_Potencial']].head()
#
# ---- Selecting a Single Row ----
#
df[0:1]
#
# ---- Selecting Multiple Rows ----
#
df[0:10]
#
# ---- Selecting Rows with iloc ----
#
df.iloc[15:20]
#
# ---- Selecting Rows with Conditions ----
#
#
# - selecting stands with Power and Development_Potencial stats equals to 'A'
#
df.loc[(df['Power'] == 'A') & (df['Development_Potencial'] == 'A')]
1.3) Updating
Updating is the action to change row values. In this example, let's apply an Encoding to the values, that is, convert the strings to numbers:Encoding
None >> 0
E >> 1
D >> 2
C >> 3
B >> 4
A >> 5
Infi >> 999#
# ---- Updating Values ----
#
df.fillna(0, inplace=True)
df.replace('None', 0, inplace=True)
df.replace('E', 1, inplace=True)
df.replace('D', 2, inplace=True)
df.replace('C', 3, inplace=True)
df.replace('B', 4, inplace=True)
df.replace('A', 5, inplace=True)
df.replace('Infi', 999, inplace=True)
df.head()
1.4) Deleting / Dropping
Now, let's
delete / drop the first rows - adiós Anubis Stand! Oh, delete and drop means the same thing, both terms are interchangeable.#
# ---- Deleting / Dropping Rows ----
#
anubis_stand = df.iloc[0]
df.drop(0, inplace=True)
print(f'Anubis Stand: {anubis_stand}')
df.head()Anubis Stand: Stand Anubis
Power 4
Speed 4
Range 1
Stamina 5
Precision 1
Development_Potencial 3
Name: 0, dtype: object
1.5) Inserting
Hmmm, I like Anubis Stand, so let's add it again!!

#
# ---- Inserting Rows ----
#
# - adding to the end
#
df.loc[len(df.index)] = anubis_stand
df
Well, it was kinda a large lesson we got today, wasn't it? But you gotta agree with me, this lesson was amazing!
See you in the next post!! 👋