Classes Weighting

General information

Classes weights can be used in order to balance the dataset. They give all the classes equal importance on gradient updates when model is training regardless of how many samples we have from each class in the training data. This prevents models from predicting the more frequent class more often just because it’s more common.

Description

Brick Locations

Bricks → Machine Learning → Classes Weighting

Brick Parameters

Target variable

The column which contains target variable of classes.

Weights adjustment

In order to activate it, run the pipeline.

There you can enter the weights or use ‘Auto Suggestion’ to do it in the optimized way.

Brick Inputs/Outputs

Inputs

Brick takes the classified dataset.

Outputs

Brick produces the dataset with a weighted column.

Example of usage

Let's consider the binary classification problem

🛥️

Binary classification : Titanic

The dataset is unbalanced since Survived column contains less zero than one. That is why it may be useful to do classes weighting before using machine learning for classification.

We use simple Logistic Regression to classify data and get following result. We notice that even for class 1 we get more predictions of 0 because of imbalance in initial data.

Let’s add Class Weighting brick to see what will change.

As a ‘Target variable’ we choose ‘survived’ and use ‘Auto Suggestion’.

As a result, we get the dataset with a column “normalized weight”.

Then, to train our model on weighted dataset we choose in sidebar weighting as Balancing method and for Weights column choose our column with normalized weights.

As a result, we improved all the metrics and increased predicted values ‘1’ in the actual class ‘1’.