Scorecard

General information

Credit scorecards are a very popular approach to the quantitative representation of the probability that clients will be prone to demonstrate some defined behavior, like, for instance, loan default, bankruptcy, or payment without delinquency. Clients are described by a set of attributes that are characterized with the specific partial scores that represent the contribution of the attributes to the final score - the higher score the more tendency to demonstrate the target behavior. The partial scores can be learned from the historical data that connect the clients' features with the target behavior and the commonly used technique, in this case, is Logistics regression. The coefficients of the Logistic Regression model can be transformed to partial scores with scaling so that they reflect the impact of the separate attribute on the final decision and lead to the expected Scores ranges.

There are two popular approaches for the scaling of the Logistics Regression coefficients ( - LogReg coefficient for the variable , - LogReg intercept, n - number of the independent variables ):

Min-max scaling - transform the LogReg coefficients to the partial scores so that the final clients' scores belong to the min-max range

Odds-based scaling - transform the LogReg coefficients to the partial scores so that the final clients' scores reflect the credit default odds. This method depends on three parameters - target odds, target score and points to double the odds (pdo): example - target score of 600 to mean a 50 (target odds) to 1 odds of the good customer to bad, and an increase of 20 means a doubling odds)

The scaling procedure can be extended with WOE correction - this approach allows to consider the dependencies in the novel labeled data:

Description

Brick Location

Bricks → Machine Learning → Scorecard

Bricks → Analytics → Credit Scoring → Scorecard

Bricks → Use Cases → Credit Scoring → Credit Scoring Model → Scorecard

Brick Parameters

Scaling

The type of final scores representation. There are two strategies are available - Min-Max scaling and Odds scaling

Min Score

The minimal value of the client score is expected after the Scorecard brick applying to the trained Logistic regression model. It is expected that the worst situation is described with this score

Max Score

The maximal value of the client score is expected after the Scorecard brick applying to the trained Logistic regression model. It is expected that the best situation is described with this score

Target odds and Target Score

Odds-based scaling parameters - "target score" defines the client with "target odds" to 1 odds of the good customer to bad.

Points to double the odds

Score increasing value when good/bad odds are doubled

WoE correction

The binary flag for the adjustment of the score based on WoE considering

Target

A binary variable that is used as a target variable in a binary classification problem. The weight of evidence of the separate attributes is calculated with respect to the specified target. The target variable should be present in the input dataset and takes two values - (0, 1).

Columns

List of possible columns for selection. It is possible to choose several columns for filtering by clicking on the '+' button in the brick settings and specifying the way of their processing:

remove all mentioned columns from dataset and proceed with the rest ones as with predictors

use the selected columns as predictors and proceed with them

Remove all except selected

The binary flag, which determines the behavior in the context of the selected columns

Notes

Scorecard requires the binary feature vector. Please be sure that the Scoring Model meets these requirements.

Scorecards adjustment with Data Sampling requires the consistency between WoE and LogReg coefficient for the correspondent categories, otherwise, these categories will be excluded from the resulted Scorecards.

Example of usage

Let's consider the binary classification problem . The inverse target variable takes two values - survived (0) - good or non-event case / not-survived (1) - bad or event case. The general information about predictors is represented below:

passengerid (category) - ID of passenger

name (category) - Passenger's name

pclass (category) - Ticket class

sex (category) - Gender

age (numeric) - Age in years

sibsp (numeric) - Number of siblings / spouses aboard the Titanic

parch (category) - Number of parents / children aboard the Titanic

ticket (category) - Ticket number

fare (numeric) - Passenger fare

cabin (category) - Cabin number

embarked (category) - Port of Embarkation

💡

We need to get the scorecard that allows to assess the Survival Score of the passenger.

Data Processing Pipeline description

For getting the Scorecard we need to train the Logistics Regression model, which returns the probability of Not-Survive (0 - if the passenger survived and 1 otherwise). The feature-vector of the Logistic regression should be represented in the binary form, that's why we need to make some data preprocessing - encoding of the categorical features and binning with the further encoding of the numerical ones.

The demo pipeline with the Scorecard brick is represented below

Scorecard brick should be connected with Logistics Regression and can accept the dataset with the corresponded structure for the WoE adjustment. In this case, we used Odds Scaling with the Target score of 600, which corresponds to the Target odds of 20 to 1. The final Scorecard is depicted below:

The resulted table has three columns - Feature (scored attribute), Coef(coefficients of the trained Logistic regression model) and Score (partial scores for the attributes). As the model predicts the chance of non-surviving, the partial scores, which contribute to the surviving score, have the opposite sign to the LogReg coefficients.

It's not difficult to see that if the passenger belongs to the female group or is a child or is traveling in the first class - they have more chances for surviving. On the other hand, the "third class" attribute is extremely decreased the Surviving score.