🧠 Intro to Deep Learning 2: Binary Classification Problems

date
May 11, 2023
slug
intro-to-deep-learning-binary-classification-problems
status
Published
tags
Deep Learning
summary
Why just make predictions if we can also classify?
type
Post
Last updated
May 12, 2023 10:44 PM
Classification is another Supervisioned Learning method that we can apply on ours networks. In the previous two notebooks we have seen how to predictions with Regressions, but now, let's give an especial attention to the Classifications.
 
Besides, as far as you already the basic network structure, such as Activation Functions, Optimizers and Loss Functions, I will be straight to the point in this notebook and just go deeper in the new concepts that you will see here.
 
So, let's hands-on!!
 
Topics:
- Accuracy and Cross-Entropy
- Sigmoid Activation Function
- Binary Classification in Python Code
 

 

0) Accuracy and Cross-Entropy

 
If you had studied or worked with Classification Problems with Machine Learning, you've probably stumbled upon with Accuracy. This a measure metric, like Mean Absolute Error (*MAE*), Mean Squared Error (*MSE*) and Huber Loss Function, however, destined to Classification Problems rather than Regression ones.
 
Accuracy works measuring the ratio of correct predictions (True Positives and True Negatives) to total predictions (True Positives, True Negatives, False Positives and False Negatives), being the equation:
 
accuracy = (true_positives + true_negatives) / (true_positives + true_negatives + false_positives + false_negatives)
 

 
True Positive (TP) - the model predicted true, and the real outcome is true; ✔️
 
True Negative (TN) - the model predicted false, and the real outcome is false; ✔️
 
False Positive (FP) - the model predicted true, and the real outcome is false; ❌
 
False Negative (FN) - the model predicted false, and the real outcome is true. ❌
 

 
However, this metric and the most classification metrics used on Machine Learning cannot be used as a Loss Function in Deep Learning, because our Stocastig Gradient Descendent (*SGD*), AKA Optimizer, needs a Loss Function that changes smoothly, but Accuracy, being a ratio of counts, changes in jumps. So, we have to choose a substitute to be the Loss Function, and guess what? This is when Cross-Entropy Function comes in action.
 
Taking a look at the image below, you can realize that the line is smoothly, that is, there are no elbow curves in its trajectory!!
 
notion image
 
The idea here is that we want our network to predict the correct class with probability closer to 1.0, that's, the probability closer to 100% certainty of getting the correct predicted class. The further away the predicted probability is from 1.0, the greater will be the Cross-Entropy Loss.
 

 

1) Sigmoid Function

 
The Loss Function is not just the one to be changed to Classification Problems, we have to change the Activation Function for our output layer too. A great choice to replace the Linear Function used for Regression is the Sigmoid one.
 
Sigmoid Function works likely to Logistic Regression, that is, gets the output of the previous layer and convert it to a range between 0.0 and 1.0 and, accordingly to the output value, the network classifies the input in one of the two possible classes.
 

 
Output Value Greater than or Equals Threshold - classifies the input as the class A;
Output Value Smaller than Threshold - classifies the input as the class B.
 

 
Usually, 0.5 is the default value for the Threshold use by Keras, but you can change this value by code.
 
The image below explains the previous explanation.
 
notion image
 

 

2) Binary Classification in Python Code

 
Let's see how to apply the new knowledge in Python Code!!
 
import pandas as pd # pip install pandas
import matplotlib.pyplot as plt # pip install matplotlib

# ---- Reading Dataset ----
df = pd.read_csv('../datasets/ion.csv', index_col=0)
display(df.head())

# ---- Encoding Classes ----
#
# \ good  >>  0
# \ bad   >>  1
#
df['Class'] = df['Class'].map({'good': 0, 'bad': 1})

# ---- Splitting Dataset into Train and Validation ----
df_train = df.sample(frac=0.7, random_state=0)
df_valid = df.drop(df_train.index)

# ---- Scaling the Train and Validation Splits ----
max_ = df_train.max(axis=0)
min_ = df_train.min(axis=0)
df_train = (df_train - min_) / (max_ - min_)
df_valid = (df_valid - min_) / (max_ - min_)

# ---- Droping Rows with Missing Values ----
df_train.dropna(axis=1, inplace=True)
df_valid.dropna(axis=1, inplace=True)

# ---- Splitting Datasets into Features (X) and Targets (y)
X_train = df_train.drop('Class', axis=1)
X_valid = df_valid.drop('Class', axis=1)
y_train = df_train['Class']
y_valid = df_valid['Class']
notion image
 
# pip install tensorflow
from tensorflow import keras
from tensorflow.keras import layers

# ---- Creating the Model ----
model = keras.Sequential([
    # hidden layers
    layers.Dense(units=4, activation='relu', input_shape=[33])
    , layers.Dense(units=4, activation='relu')
    
    # output layer
    , layers.Dense(units=1, activation='sigmoid') # Sigmoid Activation Function goes here
])

# ---- Assigning Optimizer and Loss Function ----
model.compile(
    optimizer='adam'
    , loss='binary_crossentropy'
    , metrics=['binary_accuracy']
)
               
# ---- Summarizing Model ----
model.summary()
Output:
Model: "sequential_15"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_45 (Dense)            (None, 4)                 136       
                                                                 
 dense_46 (Dense)            (None, 4)                 20        
                                                                 
 dense_47 (Dense)            (None, 1)                 5         
                                                                 
=================================================================
Total params: 161
Trainable params: 161
Non-trainable params: 0
 
# ---- Early Stopping Strategy ----
early_stopping = keras.callbacks.EarlyStopping(
    min_delta=0.001              # minimum learning rate
    , patience=10                # tolerated number of epochs
    , restore_best_weights=True  # literally restore the best weights over the training step
)

# ---- Training the Model ----
history = model.fit(
    X_train, y_train
    , validation_data=[X_valid, y_valid]
    , batch_size=512
    , epochs=1000
    , callbacks=[early_stopping]
    , verbose=0
)

# ---- Plotting the Results ----
history_df = pd.DataFrame(history.history)

history_df.loc[:, ['loss', 'val_loss']].plot()
plt.title('Loss per Epoch')
plt.xlabel('Epoch')
plt.ylabel('Loss')

history_df.loc[:, ['binary_accuracy', 'val_binary_accuracy']].plot()
plt.title('Accuracy per Epoch')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')

# ---- Best Loss and Accuracy on Validation Step ----
print(("Best Validation Loss: {:0.4f}" +\
      "\nBest Validation Accuracy: {:0.4f}")\
      .format(history_df['val_loss'].min(), 
              history_df['val_binary_accuracy'].max()))
Best Validation Loss: 0.3043
Best Validation Accuracy: 0.9048
notion image
notion image
 

 
OBS. - this example works out just with Binary Classifications, if you're working with Multiple Classifications, consider searching for the appropriate Optimizers, Loss Functions and Metrics.