📊 Data Visualization: Seaborn Cheat Sheet
date
Apr 1, 2023
slug
data-visualization-seaborn-cheat-sheet
status
Published
tags
Data Visualization Basics
summary
A Seaborn Cheat Sheet for your Data Analysis.
type
Post
Last updated
Apr 1, 2023 04:47 PM
👋 Hi guys, in this lesson I will show a wide
Seaborn Cheat Sheet
to you know which plots are available in this library and what situations is better to use each one. So let's go!!PS.: you can get all codes shown and all datasets used here in this GitHub Repository: data-visualization-basics-posts.
First of all, let’s load an image containing the brief of everything I will cover in this lesson:
from IPython.display import Image
Image('./datas/seaborn-plots.png')
-) Preparing the Libraries and Datasets
#
# ---- Importing the modules ----
#
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
#
# ---- Reading the datasets ----
#
spotify_data = pd.read_csv('./datas/spotify.csv'
, index_col='Date'
, parse_dates=True)
flight_data = pd.read_csv('./datas/flight_delays.csv'
, index_col='Month')
insurance_data = pd.read_csv('./datas/insurance.csv')
iris_data = pd.read_csv('./datas/iris.csv'
, index_col='Id')
0) Trend Plots
Trend Plots are used to show patterns of changes, especially over the time, such as hours, days, weeks, years and so on. Styles:
- Line Plots
0.1) Line Plots
Line Plots are usefull to show changes of variables over the time. Commands:
All Variables
sns.lineplot(data=data)
Just a Few Variables
sns.lineplot(data=data1, label='label1')
sns.lineplot(data=data2, label='label2)
#
# ---- Showing a line plot with all columns ----
#
plt.figure(figsize=(10,7))
plt.title('Songs on Spotify: Number of Views per Date')
plt.xlabel('Date')
plt.ylabel('Number of Views')
sns.lineplot(data=spotify_data)
plt.show()
#
# ---- Line Plot with just a few columns/variables ----
#
plt.figure(figsize=(10,7))
plt.title('Spotify Songs: Number of Views per Date')
plt.xlabel('Date')
plt.ylabel('Number of Views')
sns.lineplot(data=spotify_data['Despacito'], label='Despacito')
sns.lineplot(data=spotify_data['Something Just Like This'], label='Something Just Like This')
sns.lineplot(data=spotify_data['Unforgettable'], label='Unforgettable')
plt.show()
1) Relationship Plots
Relationship Plots are used to show comparisons/relationships between the dataset variables. Styles:
- Bar Plot
- HeatMap Plot
- Scatter Plot
- Regression Plot
- Lm Plot
- Swarm Plot
1.0) Bar Plots
Bar Plots compare quantities corresponding from different groups. Commands:
Simple Bar Plot
sns.barplot(data=data, x=x_data, y=y_data)
Bar Plot With Categorial Variable
sns.barplot(data=data, x=x_data, y=y_data, hue=categorical variable)
#
# ---- Simple Bar Plot ----
#
plt.figure(figsize=(10,7))
plt.title("Month Delays of HA's Flights")
# positive values >> the flight was delayed
# negative values >> the flight was early
sns.barplot(data=flight_data, x=flight_data.index, y='HA')
plt.show()
#
# ---- Bar Plot with Categorical Variable ----
#
plt.figure(figsize=(10,7))
plt.title('Charges by BMI and Smoker or Not People')
sns.barplot(data=insurance_data[0:20]
, x='bmi'
, y='charges'
, hue='smoker')
plt.show()
1.1) HeatMap Plots
HeatMaps show patterns and correlations colouring the variables, only accept numerical variables and categorical variables must be labelled. Commands:
Simple HeatMap
sns.heatmap(data=data, annot=True|False)
#
# ---- Simple HeatMap ----
#
plt.figure(figsize=(10,7))
plt.title('Flights Delays per Month')
plt.xlabel('Flights')
plt.ylabel('Months')
sns.heatmap(data=flight_data, annot=True)
plt.show()
1.2) Scatter Plot
Scatter Plots show how the data are distributed in the dataset. Commands:
Simple Scatter Plot
sns.scatterplot(data=data, x=x_data, y=y_data)
Scatter Plot with Categorical Variable
sns.scatterplot(data=data, x=x_data, y=y_data, hue=categorical_variable)
#
# ---- Simple Scatter Plot ----
#
plt.figure(figsize=(10,7))
plt.title('Charges by BMI')
plt.xlabel('BMI')
plt.ylabel('Charges')
sns.scatterplot(data=insurance_data, x='bmi', y='charges')
plt.show()
#
# ---- Scatter Plot with Categorical Variable ----
#
plt.figure(figsize=(10,7))
plt.title('Charges by BMI and Smoker or Not People')
plt.xlabel('BMI')
plt.ylabel('Charges')
sns.scatterplot(data=insurance_data, x='bmi', y='charges', hue='smoker')
plt.show()
1.3) Regression Plots
Regression Plots are like the scatter ones, but with a regression line displaying the relationship between the x and y variables. Commands:
Regression Plot
sns.regplot(data=data, x=x_data, y=y_data)
#
# ---- Regression Plot ----
#
plt.figure(figsize=(10,7))
plt.title('Charges by BMI')
plt.xlabel('BMI')
plt.ylabel('Charges')
# as higher the BMI is, as more the person pays for charges
sns.regplot(data=insurance_data, x='bmi', y='charges')
plt.show()
1.4) LM Plots
LM Plots are like the regression ones, but with a categorical variable and a regression line for each categorical group. Commands:
LM Plot
sns.lmplot(data=data, x=x_data, y=y_data, hue=categorical_variable)
#
# ---- LM Plot ----
#
#
# we can assume that if the person is a smoker, they will pay
# for more charges as higher the BMI is.
#
# the same is valid for non-smokers, but the charge value is way
# to slow compared to the smoker ones
sns.lmplot(data=insurance_data, x='bmi', y='charges', hue='smoker');
1.5) Swarm Plots
Swarm Plots show the relationship between a categorical variable and a numerical one. Commands:
Swarm Plot
sns.swarmplot(data=data, x=x_data, y=y_data)
#
# ---- Swarm Plot ----
#
plt.figure(figsize=(10,7))
plt.title('Charges by Smoker or Non-Smoker People')
plt.xlabel('Smoker or Not')
plt.ylabel('Charges')
sns.swarmplot(data=insurance_data, x='smoker', y='charges')
plt.show()
2) Distribution Plots
Distribution Plots show how the datas are distributed into the dataset and help the Data Scientist takes guesses of the possible output values for specific inputs. Styles:
- Histogram Plots
- KDE Plots
- Joint Plots
2.0) Histogram Plots
Histogram Plots show the distribution of a single a variable with or without a categorical one. Commands:
Simple Hist Plot
sns.histplot(data=data, x=x_data)
Hist Plot with Categorical Variable
sns.histplot(data=data, x=x_data, hue=categorical_variable)
#
# ---- Simple Hist Plot ----
#
plt.figure(figsize=(10,7))
plt.title('Number of Flowers by Sepal Length (cm)')
plt.xlabel('Count')
plt.ylabel('Sepal Length (cm)')
sns.histplot(data=iris_data, x='Sepal Length (cm)')
plt.show()
#
# ---- Hist Plot with Categorical Variable ----
#
plt.figure(figsize=(10,7))
plt.title('Number of Flowers by Sepal Length (cm) and Species')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Count')
sns.histplot(data=iris_data, x='Sepal Length (cm)', hue='Species')
plt.show()
#
# ---- Simple Histogram that I liked so badly ----
#
plt.figure(figsize=(10,7))
plt.title('Number of Flowers by Sepal Length (cm)')
plt.xlabel('Count')
plt.ylabel('Sepal Length (cm)')
sns.histplot(data=iris_data)
plt.show()
2.1) KDE Plots
KDE Plots show a smoother distribution (compared to the histograms) of one variable with or without a categorical one. Commands:
Simple KDE Plot
sns.kdeplot(data=data, x=x_data, shade=True|False)
KDE Plot with Categorical Variable
sns.kdeplot(data=data, x=x_data, hue=categorical_variable, shade=True|False)
#
# ---- Simple KDE Plot ----
#
plt.figure(figsize=(10,7))
plt.title('Number of Flowers per Sepal Length (cm)')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Density')
sns.kdeplot(data=iris_data, x='Sepal Length (cm)', shade=True)
plt.show()
#
# ---- KDE Plot with Categorical Variable ----
#
plt.figure(figsize=(10,7))
plt.title('Number of Flowers by Sepal Length (cm) and Species')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Density')
sns.kdeplot(data=iris_data, x='Sepal Length (cm)', hue='Species', shade=True)
plt.show()
2.2) Joint Plot
Joint Plots shows a KDE Plot of two numerical variables merged. Commands:
Joint Plot
sns.jointplot(data=data, x=x_data, y=y_data, kind='kde)
#
# ---- Joint Plot ----
#
sns.jointplot(data=iris_data
, x='Sepal Length (cm)'
, y='Petal Length (cm)'
, kind='kde');
Yeah, yeah, I know that this lesson has a little too much information, but you don't have to learn all of these things at once. Use this post to review every time you have to create plots and you will get used to it.
See ya in the next lesson!! 👋