General Information
This brick provides you with an easy interface for creating your own out-of-box and distributed gradient boosted decision tree for your multiclass classification tasks (you can use this brick for binary classification as well, but we would recommend using LBGM Binary instead). Due to its leaf-wise processing nature, the created model can be easily trained on large datasets, while giving formidable results.
The models are built on three important principles:
- Weak learners
- Gradient Optimization
- Boosting Technique
In this case, the weak learners are multiple sequential specialized decision trees, which do the following things:
- the first tree learns how to fit to the target variable
- the second one learns how to fit the difference between the predictions of the first tree and the ground truth (real data)
- The next tree learns how to fit the residuals of the second tree and so on.
All those trees are trained by propagating the gradients of errors throughout the system.
The main drawback of the LGBM Binary is that finding the best split points in each tree node is both a time-consuming and memory-consuming operation.
Description
Brick Locations
Bricks → Analytics → Data Mining / AI → Classification Models → LGBM Multiclass
Brick Parameters
- Learning rate
Boosting learning rate. This parameter controls how quickly or slowly the algorithm will learn a problem. Generally, a bigger learning rate will allow a model to learn faster while a smaller learning rate will lead to a more optimal outcome.
- Number of iterations
A number of boosting iterations. This parameter is recommended to be set inversely to the learning rate selected (decrease one while increasing second).
- Number of leaves
The main parameter to control model complexity. Higher values should increase accuracy but might lead to overfitting.
- Prediction mode
- Class - get predictions as a single value of the 'closest' target class for each data point. Will create only one column with the "predicted_" prefix
- Probability of class - get the numerical probability of each class in a separate column.
This parameter specifies the model's prediction format of the target variable:
- Target Variable
The column that has the values you are trying to predict. Note that the column must contain exactly two unique values and no missing values, a corresponding error message will be given if done otherwise.
- Optimize
This checkbox enables the Bayesian hyperparameter optimization, which tweaks the learning rate, as well as the number of iterations and leaves, to find the best model's configuration in terms of metrics.
Be aware that this process is time-consuming.
- Filter Columns
If you have columns in your data that need to be ignored (but not removed from the data set) during the training process (and later during the predictions), you should specify them in this parameter. To select multiple columns, click the '+' button in the brick settings.
In addition, you can ignore all columns except the ones you specified by enabling the "Remove all except selected" option. This may be useful if you have a large number of columns while the model should be trained just on some of them.
- Brick frozen
This parameter enables the frozen run for this brick. It means that the trained model will be saved and will skip the training process during future runs, which may be useful after pipeline deployment.
This option appears only after successful regular run.
Note that frozen run will not be executed if the data structure is changed.
Brick Inputs/Outputs
- Inputs
Brick takes the data set with a target column that has exactly two unique values.
- Outputs
- Data - modified input data set with added columns for predicted classes or classes' probability
- Model - trained model that can be used in other bricks as an input
Brick produces two outputs as the result:
Additional Features
Use
To use the algorithm you need to specify the learning rate, the number of iterations, the number of leaves.
Recommendations
- Note that higher number of leaves leads to increased accuracy, but rises the chances of overfitting.
- The number of the boosting iterations is suggested to be set inversely to the learning rate (decrease one while increasing second).