Classes Weighting

General information

Classes weights can be used in order to balance the dataset. They give all the classes equal importance on gradient updates when model is training regardless of how many samples we have from each class in the training data. This prevents models from predicting the more frequent class more often just because it’s more common.


Brick Locations

Bricks Machine Learning → Classes Weighting

Brick Parameters

  • Target variable
    • The column which contains target variable of classes.
  • Weights adjustment
    • In order to activate it, run the pipeline.
      There you can enter the weights or use ‘Auto Suggestion’ to do it in the optimized way.

Brick Inputs/Outputs

  • Inputs
    • Brick takes the classified dataset.
  • Outputs
    • Brick produces the dataset with a weighted column.

Example of usage

Let's consider the binary classification problem
Binary classification : Titanic
The dataset is unbalanced since Survived column contains less zero than one. That is why it may be useful to do classes weighting before using machine learning for classification.
We use simple Logistic Regression to classify data and get following result. We notice that even for class 1 we get more predictions of 0 because of imbalance in initial data.
notion image
Let’s add Class Weighting brick to see what will change.
As a ‘Target variable’ we choose ‘survived’ and use ‘Auto Suggestion’.
notion image
notion image
As a result, we get the dataset with a column “normalized weight”.
notion image
Then, to train our model on weighted dataset we choose in sidebar weighting as Balancing method and for Weights column choose our column with normalized weights.
notion image
As a result, we improved all the metrics and increased predicted values ‘1’ in the actual class ‘1’.
notion image