# Tutorial: Scorecard with binary target¶

In this tutorial, we use the dataset from the FICO Explainable Machine Learning Challenge: https://community.fico.com/s/explainable-machine-learning-challenge. The goal is to develop a scorecard using the logistic regression as an estimator.

[1]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

[2]:

from sklearn.linear_model import LogisticRegression

from optbinning import BinningProcess
from optbinning import Scorecard
from optbinning.scorecard import plot_auc_roc, plot_cap, plot_ks


[3]:

df = pd.read_csv("data/FICO_challenge/heloc_dataset_v1.csv", sep=",")

[4]:

variable_names = list(df.columns[1:])
X = df[variable_names]


Transform the categorical dichotomic target variable into numerical.

[5]:

target = "RiskPerformance"
y = df[target].values
y = y.astype(int)

[6]:

df.head()

[6]:

0 1 55 144 4 84 20 3 0 83 2 ... 43 0 0 0 33 -8 8 1 1 69
1 1 61 58 15 41 2 4 4 100 -7 ... 67 0 0 0 0 -8 0 -8 -8 0
2 1 67 66 5 24 9 0 0 100 -7 ... 44 0 4 4 53 66 4 2 1 86
3 1 66 169 1 73 28 1 1 93 76 ... 57 0 5 4 72 83 6 4 3 91
4 1 81 333 27 132 12 0 0 100 -7 ... 25 0 1 1 51 89 3 1 0 80

5 rows × 24 columns

## Scorecard development¶

This dataset includes three special values/codes:

• -9 No Bureau Record or No Investigation

• -8 No Usable/Valid Trades or Inquiries

• -7 Condition not Met (e.g. No Inquiries, No Delinquencies)

[7]:

special_codes = [-9, -8, -7]


We specify a selection criteria in terms of the Information Value (IV) predictiveness and minimum quality score to remove low-quality variables. Then, we instantiate a BinningProcess object class with variable names, special codes and selection criteria.

[8]:

selection_criteria = {
"iv": {"min": 0.02, "max": 1},
"quality_score": {"min": 0.01}
}

[9]:

binning_process = BinningProcess(variable_names, special_codes=special_codes,
selection_criteria=selection_criteria)


We select as an estimator a logistic regression to be solved using the non-linear solver L-BFGS-B.

[10]:

estimator = LogisticRegression(solver="lbfgs")


Finally, we instantiate a Scorecard class with the target name, a binning process object, and an estimator. In addition, we want to apply a scaling method to the scorecard points.

[11]:

scorecard = Scorecard(binning_process=binning_process,
estimator=estimator, scaling_method="min_max",
scaling_method_params={"min": 300, "max": 850})

[12]:

scorecard.fit(X, y, show_digits=4)

/home/gui/projects/github/top/optbinning/optbinning/binning/transformations.py:38: RuntimeWarning: invalid value encountered in log
return np.log((1. / event_rate - 1) * n_event / n_nonevent)

[12]:

Scorecard(binning_process=BinningProcess(selection_criteria={'iv': {'max': 1,
'min': 0.02},
'quality_score': {'min': 0.01}},
special_codes=[-9, -8, -7],
variable_names=['ExternalRiskEstimate',
'AverageMInFile',
'MSinceMostRecentInqexcl7days',
'NumInqLast6M',
'NumInqLast6Mexcl7days',
'NetFractionRevolvingBurden',
'NetFractionInstallBurden',
estimator=LogisticRegression(), scaling_method='min_max',
scaling_method_params={'max': 850, 'min': 300})


Similar to other objects in OptBinning, we can print overview information about the options settings, problems statistics, and the number of selected variables after the binning process. With these settings, using the selection criteria, 4 variables are removed.

[13]:

scorecard.information(print_level=2)

optbinning (Version 0.19.0)

Begin options
binning_process                      yes   * U
estimator                            yes   * U
scaling_method                   min_max   * U
scaling_method_params                yes   * U
intercept_based                    False   * d
reverse_scorecard                  False   * d
rounding                           False   * d
verbose                            False   * d
End options

Statistics
Number of records                  10459
Number of variables                   23
Target type                       binary

Number of numerical                   23
Number of categorical                  0
Number of selected                    19

Timing
Total time                          3.28 sec
Binning process                     2.62 sec   ( 79.92%)
Estimator                           0.25 sec   (  7.70%)
Build scorecard                     0.41 sec   ( 12.36%)
rounding                          0.00 sec   (  0.00%)



The method table returns the scorecard table. A scorecard table has a wide range of real-world business applications, being an interpretable tool to summarize relationships among variables. The scorecard table can handle binary and continuous targets. Two scorecard styles are available: style="summary" shows the variable name, and their corresponding bins and assigned points; style="detailed" adds information from the corresponding binning table.

[14]:

scorecard.table(style="summary")

[14]:

Variable Bin Points
0 ExternalRiskEstimate (-inf, 59.5000) 5.359275
1 ExternalRiskEstimate [59.5000, 63.5000) 11.598078
2 ExternalRiskEstimate [63.5000, 65.5000) 18.168973
3 ExternalRiskEstimate [65.5000, 67.5000) 19.821705
4 ExternalRiskEstimate [67.5000, 70.5000) 25.498720
... ... ... ...

164 rows × 3 columns

[15]:

scorecard.table(style="detailed")

[15]:

Variable Bin id Bin Count Count (%) Non-event Event Event rate WoE IV JS Coefficient Points
0 ExternalRiskEstimate 0 (-inf, 59.5000) 1081 0.103356 166 915 0.846438 -1.619109 0.217629 0.024574 -0.327969 5.359275
1 ExternalRiskEstimate 1 [59.5000, 63.5000) 1097 0.104886 228 869 0.792160 -1.250170 0.142003 0.016678 -0.327969 11.598078
2 ExternalRiskEstimate 2 [63.5000, 65.5000) 681 0.065111 190 491 0.720999 -0.861592 0.044754 0.005427 -0.327969 18.168973
3 ExternalRiskEstimate 3 [65.5000, 67.5000) 652 0.062339 195 457 0.700920 -0.763856 0.034156 0.004169 -0.327969 19.821705
4 ExternalRiskEstimate 4 [67.5000, 70.5000) 1038 0.099245 388 650 0.626204 -0.428139 0.017755 0.002203 -0.327969 25.498720
... ... ... ... ... ... ... ... ... ... ... ... ... ...
8 PercentTradesWBalance 8 [80.5000, 87.5000) 797 0.076202 283 514 0.644918 -0.508949 0.019114 0.002364 -0.016322 32.310289
9 PercentTradesWBalance 9 [87.5000, 98.0000) 652 0.062339 184 468 0.717791 -0.845705 0.041380 0.005024 -0.016322 32.026880
10 PercentTradesWBalance 10 [98.0000, inf) 1277 0.122096 331 946 0.740799 -0.962296 0.103054 0.012407 -0.016322 31.928758
11 PercentTradesWBalance 11 Special 606 0.057941 269 337 0.556106 -0.137544 0.001091 0.000136 -0.016322 32.738612
12 PercentTradesWBalance 12 Missing 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000 -0.016322 32.738612

164 rows × 13 columns

We can check the correctness of the scaling method as follows

[16]:

sc = scorecard.table(style="summary")
sc.groupby("Variable").agg({'Points' : [np.min, np.max]}).sum()

[16]:

Points  amin    300.0
amax    850.0
dtype: float64


## Scorecard performance¶

Compute predicted probabilities of the fitted estimator.

[17]:

y_pred = scorecard.predict_proba(X)[:, 1]

/home/gui/projects/github/top/optbinning/optbinning/binning/transformations.py:38: RuntimeWarning: invalid value encountered in log
return np.log((1. / event_rate - 1) * n_event / n_nonevent)


Plot Area Under the Receiver Operating Characteristic Curve (AUC ROC).

[18]:

plot_auc_roc(y, y_pred)


Plot Cumulative Accuracy Profile (CAP).

[19]:

plot_cap(y, y_pred)


Plot Kolmogorov-Smirnov (KS).

[20]:

plot_ks(y, y_pred)


Calculate the score of the dataset and plot distribution of scores for event and non-event records.

[21]:

score = scorecard.score(X)

[22]:

mask = y == 0