Tutorial: Scorecard monitoring¶
This tutorial is a continuation of the two previous scorecard tutorials, where we focus on scorecard monitoring. Scorecard monitoring is important to determine if the distribution of new data has shifted with respect to the original data used to develop the scorecard. Besides, monitoring is also useful to detect errors in raw data and track scorecard performance.
[1]:
import numpy as np
import pandas as pd
[2]:
from sklearn.datasets import load_breast_cancer
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import HuberRegressor
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from optbinning import BinningProcess
from optbinning import Scorecard
from optbinning.scorecard import ScorecardMonitoring
Binary target¶
We use the Home equity line of credit (HELOC) dataset from FICO Explainable Machine Learning Challenge https://community.fico.com/s/explainable-machine-learning-challenge.
[3]:
df = pd.read_csv("data/FICO_challenge/heloc_dataset_v1.csv", sep=",")
variable_names = list(df.columns[1:])
target = "RiskPerformance"
y = df[target].values
mask = y == "Bad"
y[mask] = 1
y[~mask] = 0
y = y.astype(int)
X = df[variable_names]
For this example, we split data to compare the robustness of the developed scorecard in the test dataset.
[4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=42)
As in the previous example, we specify a list of special codes and a selection criteria to be applied in the binning process.
[5]:
special_codes = [-9, -8, -7]
[6]:
selection_criteria = {
"iv": {"min": 0.02, "max": 1},
"quality_score": {"min": 0.01}
}
[7]:
binning_process = BinningProcess(variable_names, special_codes=special_codes,
selection_criteria=selection_criteria)
[8]:
estimator = LogisticRegression(solver="lbfgs")
Now, we instantiate a Scorecard
class with the target name, a binning process object, and an estimator, and fit with training data.
[9]:
scorecard = Scorecard(binning_process=binning_process,
estimator=estimator, scaling_method="min_max",
scaling_method_params={"min": 0, "max": 100})
[10]:
scorecard.fit(X_train, y_train, metric_special="empirical", metric_missing="empirical")
[10]:
Scorecard(binning_process=BinningProcess(selection_criteria={'iv': {'max': 1,
'min': 0.02},
'quality_score': {'min': 0.01}},
special_codes=[-9, -8, -7],
variable_names=['ExternalRiskEstimate',
'MSinceOldestTradeOpen',
'MSinceMostRecentTradeOpen',
'AverageMInFile',
'NumSatisfactoryTrades',
'NumTrades60Ever2DerogPubRec',
'NumTrades90Ever2DerogPubRec',
'PercentTradesNe...
'PercentInstallTrades',
'MSinceMostRecentInqexcl7days',
'NumInqLast6M',
'NumInqLast6Mexcl7days',
'NetFractionRevolvingBurden',
'NetFractionInstallBurden',
'NumRevolvingTradesWBalance',
'NumInstallTradesWBalance',
'NumBank2NatlTradesWHighUtilization',
'PercentTradesWBalance']),
estimator=LogisticRegression(), scaling_method='min_max',
scaling_method_params={'max': 100, 'min': 0})
[11]:
scorecard.information(print_level=2)
optbinning (Version 0.19.0)
Copyright (c) 2019-2024 Guillermo Navas-Palencia, Apache License 2.0
Begin options
binning_process yes * U
estimator yes * U
scaling_method min_max * U
scaling_method_params yes * U
intercept_based False * d
reverse_scorecard False * d
rounding False * d
verbose False * d
End options
Statistics
Number of records 7321
Number of variables 23
Target type binary
Number of numerical 23
Number of categorical 0
Number of selected 21
Timing
Total time 3.03 sec
Binning process 2.52 sec ( 83.32%)
Estimator 0.21 sec ( 6.86%)
Build scorecard 0.30 sec ( 9.80%)
rounding 0.00 sec ( 0.00%)
Once the scorecard is fitted, we use the ScorecardMonitoring
class to ensure that the resulting scorecard is discriminating using train and test data. Furthermore, this class analyzes whether the distribution of train and test data differ significantly. In practice, df_train
would be the (expected) data used for scorecard development, whereas df_test
would be the (actual) evolved data.
[12]:
monitoring = ScorecardMonitoring(scorecard=scorecard, psi_method="cart",
psi_n_bins=10, verbose=True)
[13]:
monitoring.fit(X_test, y_test, X_train, y_train)
2024-01-15 00:25:29,298 | INFO : Monitoring started.
2024-01-15 00:25:29,301 | INFO : Options: check parameters.
2024-01-15 00:25:29,304 | INFO : System stability analysis started.
2024-01-15 00:25:29,693 | INFO : System stability analysis terminated. Time: 0.3885s
2024-01-15 00:25:29,696 | INFO : Variable analysis started.
2024-01-15 00:25:29,914 | INFO : Variable analysis terminated. Time: 0.2160s
2024-01-15 00:25:29,916 | INFO : Monitoring terminated. Time: 0.6172s
[13]:
ScorecardMonitoring(psi_n_bins=10,
scorecard=Scorecard(binning_process=BinningProcess(selection_criteria={'iv': {'max': 1,
'min': 0.02},
'quality_score': {'min': 0.01}},
special_codes=[-9,
-8,
-7],
variable_names=['ExternalRiskEstimate',
'MSinceOldestTradeOpen',
'MSinceMostRecentTradeOpen',
'AverageMInFile',
'NumSatisfactoryTrades',
'NumTrades60Ever2DerogPubRec',
'N...
'MSinceMostRecentInqexcl7days',
'NumInqLast6M',
'NumInqLast6Mexcl7days',
'NetFractionRevolvingBurden',
'NetFractionInstallBurden',
'NumRevolvingTradesWBalance',
'NumInstallTradesWBalance',
'NumBank2NatlTradesWHighUtilization',
'PercentTradesWBalance']),
estimator=LogisticRegression(),
scaling_method='min_max',
scaling_method_params={'max': 100,
'min': 0}),
verbose=True)
Similar to other objects in OptBinning, we can print overview information about the options settings, data statistics and CPU times.
[14]:
monitoring.information(print_level=2)
optbinning (Version 0.19.0)
Copyright (c) 2019-2024 Guillermo Navas-Palencia, Apache License 2.0
Begin options
scorecard yes * U
psi_method cart * d
psi_n_bins 10 * U
psi_min_bin_size 0.05 * d
show_digits 2 * d
verbose True * U
End options
Statistics
Number of records Actual 3138
Number of records Expected 7321
Number of scorecard variables 21
Target type binary
Timing
Total time 0.62 sec
System stability 0.39 sec ( 62.94%)
Variables stability 0.22 sec ( 34.99%)
The method psi_table
returns the Population Stability Index (PSI) table. The PSI is a divergence measure equivalent to the Information Value (IV), also known as Jeffry’s divergence. This measure assesses whether the actual score distribution has shifted from the expected score distribution. This analysis requires the segmentation of the score with respect to the target, using the options psi_method
, psi_n_bins
and psi_min_bin_size
.
[15]:
monitoring.psi_table()
[15]:
Bin | Count A | Count E | Count A (%) | Count E (%) | PSI | |
---|---|---|---|---|---|---|
0 | (-inf, 41.29) | 236 | 565 | 0.075207 | 0.077175 | 0.000051 |
1 | [41.29, 46.67) | 334 | 803 | 0.106437 | 0.109684 | 0.000098 |
2 | [46.67, 51.12) | 331 | 807 | 0.105481 | 0.110231 | 0.000209 |
3 | [51.12, 54.76) | 310 | 729 | 0.098789 | 0.099577 | 0.000006 |
4 | [54.76, 57.78) | 416 | 1015 | 0.132569 | 0.138642 | 0.000272 |
5 | [57.78, 60.92) | 269 | 565 | 0.085723 | 0.077175 | 0.000898 |
6 | [60.92, 63.28) | 185 | 417 | 0.058955 | 0.056959 | 0.000069 |
7 | [63.28, 67.20) | 298 | 752 | 0.094965 | 0.102718 | 0.000608 |
8 | [67.20, 72.25) | 407 | 823 | 0.129700 | 0.112416 | 0.002472 |
9 | [72.25, inf) | 352 | 845 | 0.112173 | 0.115421 | 0.000093 |
Totals | 3138 | 7321 | 1.000000 | 1.000000 | 0.004776 |
We can plot the PSI table using method psi_plot
, where the population distribution and event rate for each bin (Bin ID) are shown.
[16]:
monitoring.psi_plot()
This analysis computes statistical tests to determine if the event rate on train and test data are significantly different using the Chi-square test. The null hypothesis is that actual = expected.
[17]:
monitoring.tests_table()
[17]:
Bin | Count A | Count E | Event rate A | Event rate E | statistic | p-value | |
---|---|---|---|---|---|---|---|
0 | (-inf, 41.29) | 236 | 565 | 0.915254 | 0.916814 | 0.005285 | 0.942048 |
1 | [41.29, 46.67) | 334 | 803 | 0.820359 | 0.851806 | 1.758519 | 0.184809 |
2 | [46.67, 51.12) | 331 | 807 | 0.758308 | 0.768278 | 0.129913 | 0.718522 |
3 | [51.12, 54.76) | 310 | 729 | 0.709677 | 0.662551 | 2.207654 | 0.137327 |
4 | [54.76, 57.78) | 416 | 1015 | 0.562500 | 0.565517 | 0.010927 | 0.916745 |
5 | [57.78, 60.92) | 269 | 565 | 0.464684 | 0.500885 | 0.955733 | 0.328264 |
6 | [60.92, 63.28) | 185 | 417 | 0.464865 | 0.429257 | 0.659372 | 0.416782 |
7 | [63.28, 67.20) | 298 | 752 | 0.322148 | 0.293883 | 0.808999 | 0.368416 |
8 | [67.20, 72.25) | 407 | 823 | 0.221130 | 0.196841 | 0.986264 | 0.320657 |
9 | [72.25, inf) | 352 | 845 | 0.130682 | 0.114793 | 0.596356 | 0.439972 |
Finally, the system_stability_report
method summarizes the findings encountered throughout the analysis. Moreover, it returns a comparison with the performance of the provided scorecard on both train and test data, to identify if the developed scorecard suffers a deterioration in predictiveness on the actual/evolved data.
[18]:
monitoring.system_stability_report()
-----------------------------------
Monitoring: System Stability Report
-----------------------------------
Population Stability Index (PSI)
PSI total: 0.0048 (No significant change)
PSI bin Count Count (%)
[0.00, 0.10) 10 1.0
[0.10, 0.25) 0 0.0
[0.25, Inf+) 0 0.0
Significance tests (H0: actual == expected)
p-value bin Count Count (%)
[0.00, 0.05) 0 0.0
[0.05, 0.10) 0 0.0
[0.10, 0.50) 7 0.7
[0.50, 1.00) 3 0.3
Target analysis
Metric Actual Actual (%) Expected Expected (%)
Number of records 3138 - 7321 -
Event records 1638 0.521989 3821 0.521923
Non-event records 1500 0.478011 3500 0.478077
Performance metrics
Metric Actual Expected Diff A - E
True positive rate 0.755189 0.770741 -0.015551
True negative rate 0.696000 0.681429 0.014571
False positive rate 0.304000 0.318571 -0.014571
False negative rate 0.244811 0.229259 0.015551
Balanced accuracy 0.725595 0.726085 -0.000490
Discriminant power 1.077740 1.087685 -0.009945
Gini 0.587042 0.604119 -0.017077
The ScorecardMonitoring
also implements functionalities to perform the characteristic stability report. The psi_variable_table
method returns the PSI using the optimal bins incorporated in the scorecard at a characteristic level.
[19]:
monitoring.psi_variable_table(style="detailed")
[19]:
Variable | Bin | Count A | Count E | Count A (%) | Count E (%) | PSI | |
---|---|---|---|---|---|---|---|
0 | ExternalRiskEstimate | (-inf, 59.50) | 494 | 1185 | 0.157425 | 0.161863 | 0.000123 |
1 | ExternalRiskEstimate | [59.50, 63.50) | 326 | 771 | 0.103888 | 0.105313 | 0.000019 |
2 | ExternalRiskEstimate | [63.50, 65.50) | 193 | 488 | 0.061504 | 0.066658 | 0.000415 |
3 | ExternalRiskEstimate | [65.50, 67.50) | 205 | 447 | 0.065328 | 0.061057 | 0.000289 |
4 | ExternalRiskEstimate | [67.50, 70.50) | 302 | 736 | 0.096240 | 0.100533 | 0.000187 |
... | ... | ... | ... | ... | ... | ... | ... |
5 | PercentTradesWBalance | [67.50, 73.50) | 216 | 516 | 0.068834 | 0.070482 | 0.000039 |
6 | PercentTradesWBalance | [73.50, 75.50) | 178 | 382 | 0.056724 | 0.052179 | 0.000380 |
7 | PercentTradesWBalance | [75.50, 84.50) | 328 | 851 | 0.104525 | 0.116241 | 0.001245 |
8 | PercentTradesWBalance | [84.50, 89.50) | 199 | 434 | 0.063416 | 0.059282 | 0.000279 |
9 | PercentTradesWBalance | [89.50, inf) | 481 | 1134 | 0.153282 | 0.154897 | 0.000017 |
135 rows × 7 columns
[20]:
monitoring.psi_variable_table(style="summary")
[20]:
Variable | PSI | |
---|---|---|
0 | AverageMInFile | 0.004087 |
1 | ExternalRiskEstimate | 0.003432 |
2 | MSinceMostRecentDelq | 0.001042 |
3 | MSinceMostRecentInqexcl7days | 0.001249 |
4 | MSinceMostRecentTradeOpen | 0.000180 |
5 | MSinceOldestTradeOpen | 0.002839 |
6 | MaxDelq2PublicRecLast12M | 0.000514 |
7 | MaxDelqEver | 0.000379 |
8 | NetFractionInstallBurden | 0.003289 |
9 | NetFractionRevolvingBurden | 0.004657 |
10 | NumBank2NatlTradesWHighUtilization | 0.000867 |
11 | NumInqLast6M | 0.001527 |
12 | NumInqLast6Mexcl7days | 0.001399 |
13 | NumRevolvingTradesWBalance | 0.001579 |
14 | NumTotalTrades | 0.001619 |
15 | NumTrades60Ever2DerogPubRec | 0.001896 |
16 | NumTrades90Ever2DerogPubRec | 0.000495 |
17 | NumTradesOpeninLast12M | 0.001805 |
18 | PercentInstallTrades | 0.001951 |
19 | PercentTradesNeverDelq | 0.002862 |
20 | PercentTradesWBalance | 0.003316 |
Continuous target¶
Similar monitoring is available for a scorecard with a continuous target.
[21]:
data = fetch_california_housing()
target = "target"
variable_names = data.feature_names
X = pd.DataFrame(data.data, columns=variable_names)
y = data.target
[22]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
[23]:
binning_process = BinningProcess(variable_names)
[24]:
estimator = HuberRegressor(max_iter=200)
[25]:
scorecard = Scorecard(binning_process=binning_process,
estimator=estimator, scaling_method="min_max",
scaling_method_params={"min": 0, "max": 100},
reverse_scorecard=True)
[26]:
scorecard.fit(X_train, y_train)
[26]:
Scorecard(binning_process=BinningProcess(variable_names=['MedInc', 'HouseAge',
'AveRooms',
'AveBedrms',
'Population',
'AveOccup', 'Latitude',
'Longitude']),
estimator=HuberRegressor(max_iter=200), reverse_scorecard=True,
scaling_method='min_max',
scaling_method_params={'max': 100, 'min': 0})
[27]:
monitoring = ScorecardMonitoring(scorecard=scorecard, psi_method="cart",
psi_n_bins=10)
monitoring.fit(X_test, y_test, X_train, y_train)
[27]:
ScorecardMonitoring(psi_n_bins=10,
scorecard=Scorecard(binning_process=BinningProcess(variable_names=['MedInc',
'HouseAge',
'AveRooms',
'AveBedrms',
'Population',
'AveOccup',
'Latitude',
'Longitude']),
estimator=HuberRegressor(max_iter=200),
reverse_scorecard=True,
scaling_method='min_max',
scaling_method_params={'max': 100,
'min': 0}))
[28]:
monitoring.information(print_level=2)
optbinning (Version 0.19.0)
Copyright (c) 2019-2024 Guillermo Navas-Palencia, Apache License 2.0
Begin options
scorecard yes * U
psi_method cart * d
psi_n_bins 10 * U
psi_min_bin_size 0.05 * d
show_digits 2 * d
verbose False * d
End options
Statistics
Number of records Actual 6192
Number of records Expected 14448
Number of scorecard variables 8
Target type continuous
Timing
Total time 0.22 sec
System stability 0.13 sec ( 59.21%)
Variables stability 0.09 sec ( 40.35%)
[29]:
monitoring.psi_table()
[29]:
Bin | Count A | Count E | Count A (%) | Count E (%) | PSI | |
---|---|---|---|---|---|---|
0 | (-inf, 49.51) | 318 | 725 | 0.051357 | 0.050180 | 0.000027 |
1 | [49.51, 51.67) | 458 | 1157 | 0.073966 | 0.080080 | 0.000486 |
2 | [51.67, 53.68) | 527 | 1171 | 0.085110 | 0.081049 | 0.000198 |
3 | [53.68, 56.56) | 861 | 2022 | 0.139050 | 0.139950 | 0.000006 |
4 | [56.56, 59.35) | 907 | 2093 | 0.146479 | 0.144864 | 0.000018 |
5 | [59.35, 60.85) | 516 | 1162 | 0.083333 | 0.080426 | 0.000103 |
6 | [60.85, 63.37) | 830 | 1911 | 0.134044 | 0.132267 | 0.000024 |
7 | [63.37, 66.13) | 665 | 1531 | 0.107397 | 0.105966 | 0.000019 |
8 | [66.13, 70.97) | 586 | 1377 | 0.094638 | 0.095307 | 0.000005 |
9 | [70.97, inf) | 524 | 1299 | 0.084625 | 0.089909 | 0.000320 |
Totals | 6192 | 14448 | 1.000000 | 1.000000 | 0.001206 |
[30]:
monitoring.psi_plot()
This analysis computes statistical tests to determine if the mean on train and test data are significantly different using the Student’s t-test. The null hypothesis is that actual = expected.
[31]:
monitoring.tests_table()
[31]:
Bin | Count A | Count E | Mean A | Mean E | Std A | Std E | statistic | p-value | |
---|---|---|---|---|---|---|---|---|---|
0 | (-inf, 49.51) | 318 | 725 | 0.800129 | 0.794417 | 0.335265 | 0.389576 | 0.240789 | 0.809789 |
1 | [49.51, 51.67) | 458 | 1157 | 1.035358 | 1.060722 | 0.481755 | 0.441926 | -0.975857 | 0.329439 |
2 | [51.67, 53.68) | 527 | 1171 | 1.253723 | 1.235909 | 0.513648 | 0.473194 | 0.677242 | 0.498419 |
3 | [53.68, 56.56) | 861 | 2022 | 1.416871 | 1.403359 | 0.597450 | 0.576907 | 0.561459 | 0.574565 |
4 | [56.56, 59.35) | 907 | 2093 | 1.659644 | 1.686652 | 0.629067 | 0.680155 | -1.053431 | 0.292281 |
5 | [59.35, 60.85) | 516 | 1162 | 1.953297 | 1.909709 | 0.697145 | 0.668803 | 1.196619 | 0.231753 |
6 | [60.85, 63.37) | 830 | 1911 | 2.311459 | 2.237499 | 0.783756 | 0.772066 | 2.280284 | 0.022726 |
7 | [63.37, 66.13) | 665 | 1531 | 2.635100 | 2.653461 | 0.848958 | 0.855882 | -0.464558 | 0.642328 |
8 | [66.13, 70.97) | 586 | 1377 | 3.183081 | 3.145512 | 0.920088 | 0.910585 | 0.830408 | 0.406490 |
9 | [70.97, inf) | 524 | 1299 | 4.084080 | 4.148884 | 0.927028 | 0.879090 | -1.370776 | 0.170778 |
[32]:
monitoring.system_stability_report()
-----------------------------------
Monitoring: System Stability Report
-----------------------------------
Population Stability Index (PSI)
PSI total: 0.0012 (No significant change)
PSI bin Count Count (%)
[0.00, 0.10) 10 1.0
[0.10, 0.25) 0 0.0
[0.25, Inf+) 0 0.0
Significance tests (H0: actual == expected)
p-value bin Count Count (%)
[0.00, 0.05) 1 0.1
[0.05, 0.10) 0 0.0
[0.10, 0.50) 6 0.6
[0.50, 1.00) 3 0.3
Target analysis
Metric Actual Expected
Mean 2.066968 2.069240
Std 1.145661 1.157452
p25 1.202750 1.193000
Median 1.810000 1.793000
p75 2.650500 2.646000
Performance metrics
Metric Actual Expected Diff A - E
Mean absolute error 0.520695 0.516443 0.004253
Mean squared error 0.509314 0.502051 0.007263
Median absolute error 0.392306 0.382863 0.009443
Explained variance 0.616266 0.628663 -0.012397
R^2 0.611963 0.625250 -0.013287
MPE -0.080802 -0.082159 0.001358
MAPE 0.300370 0.297225 0.003145
SMAPE 0.136579 0.135519 0.001060
MdAPE 0.213391 0.210681 0.002710
SMdAPE 0.107431 0.105786 0.001646
[33]:
monitoring.psi_variable_table(style="detailed")
[33]:
Variable | Bin | Count A | Count E | Count A (%) | Count E (%) | PSI | |
---|---|---|---|---|---|---|---|
0 | MedInc | (-inf, 1.82) | 534 | 1253 | 0.086240 | 0.086725 | 0.000003 |
1 | MedInc | [1.82, 2.24) | 533 | 1252 | 0.086079 | 0.086656 | 0.000004 |
2 | MedInc | [2.24, 2.57) | 511 | 1112 | 0.082526 | 0.076966 | 0.000388 |
3 | MedInc | [2.57, 2.83) | 409 | 997 | 0.066053 | 0.069006 | 0.000129 |
4 | MedInc | [2.83, 3.07) | 383 | 876 | 0.061854 | 0.060631 | 0.000024 |
... | ... | ... | ... | ... | ... | ... | ... |
2 | Longitude | [-122.12, -121.45) | 815 | 1923 | 0.131621 | 0.133098 | 0.000016 |
3 | Longitude | [-121.45, -120.69) | 485 | 1180 | 0.078327 | 0.081672 | 0.000140 |
4 | Longitude | [-120.69, -119.76) | 303 | 727 | 0.048934 | 0.050318 | 0.000039 |
5 | Longitude | [-119.76, -118.91) | 363 | 858 | 0.058624 | 0.059385 | 0.000010 |
6 | Longitude | [-118.91, inf) | 3297 | 7542 | 0.532461 | 0.522010 | 0.000207 |
72 rows × 7 columns
[34]:
monitoring.psi_variable_table(style="summary")
[34]:
Variable | PSI | |
---|---|---|
0 | AveBedrms | 0.001911 |
1 | AveOccup | 0.003518 |
2 | AveRooms | 0.001540 |
3 | HouseAge | 0.001855 |
4 | Latitude | 0.003628 |
5 | Longitude | 0.000635 |
6 | MedInc | 0.001032 |
7 | Population | 0.000904 |