Tutorial: Scorecard monitoring¶

This tutorial is a continuation of the two previous scorecard tutorials, where we focus on scorecard monitoring. Scorecard monitoring is important to determine if the distribution of new data has shifted with respect to the original data used to develop the scorecard. Besides, monitoring is also useful to detect errors in raw data and track scorecard performance.

[1]:

import numpy as np
import pandas as pd

[2]:

from sklearn.datasets import load_breast_cancer
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import HuberRegressor
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

from optbinning import BinningProcess
from optbinning import Scorecard
from optbinning.scorecard import ScorecardMonitoring

Binary target¶

We use the Home equity line of credit (HELOC) dataset from FICO Explainable Machine Learning Challenge https://community.fico.com/s/explainable-machine-learning-challenge.

[3]:

df = pd.read_csv("data/FICO_challenge/heloc_dataset_v1.csv", sep=",")

variable_names = list(df.columns[1:])

target = "RiskPerformance"
y = df[target].values
mask = y == "Bad"
y[mask] = 1
y[~mask] = 0
y = y.astype(int)

X = df[variable_names]

For this example, we split data to compare the robustness of the developed scorecard in the test dataset.

[4]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=42)

As in the previous example, we specify a list of special codes and a selection criteria to be applied in the binning process.

[5]:

special_codes = [-9, -8, -7]

[6]:

selection_criteria = {
    "iv": {"min": 0.02, "max": 1},
    "quality_score": {"min": 0.01}
}

[7]:

binning_process = BinningProcess(variable_names, special_codes=special_codes,
                                 selection_criteria=selection_criteria)

[8]:

estimator = LogisticRegression(solver="lbfgs")

Now, we instantiate a Scorecard class with the target name, a binning process object, and an estimator, and fit with training data.

[9]:

scorecard = Scorecard(binning_process=binning_process,
                      estimator=estimator, scaling_method="min_max",
                      scaling_method_params={"min": 0, "max": 100})

[10]:

scorecard.fit(X_train, y_train, metric_special="empirical", metric_missing="empirical")

[10]:

Scorecard(binning_process=BinningProcess(selection_criteria={'iv': {'max': 1,
                                                                    'min': 0.02},
                                                             'quality_score': {'min': 0.01}},
                                         special_codes=[-9, -8, -7],
                                         variable_names=['ExternalRiskEstimate',
                                                         'MSinceOldestTradeOpen',
                                                         'MSinceMostRecentTradeOpen',
                                                         'AverageMInFile',
                                                         'NumSatisfactoryTrades',
                                                         'NumTrades60Ever2DerogPubRec',
                                                         'NumTrades90Ever2DerogPubRec',
                                                         'PercentTradesNe...
                                                         'PercentInstallTrades',
                                                         'MSinceMostRecentInqexcl7days',
                                                         'NumInqLast6M',
                                                         'NumInqLast6Mexcl7days',
                                                         'NetFractionRevolvingBurden',
                                                         'NetFractionInstallBurden',
                                                         'NumRevolvingTradesWBalance',
                                                         'NumInstallTradesWBalance',
                                                         'NumBank2NatlTradesWHighUtilization',
                                                         'PercentTradesWBalance']),
          estimator=LogisticRegression(), scaling_method='min_max',
          scaling_method_params={'max': 100, 'min': 0})

[11]:

scorecard.information(print_level=2)

optbinning (Version 0.19.0)
Copyright (c) 2019-2024 Guillermo Navas-Palencia, Apache License 2.0

  Begin options
    binning_process                      yes   * U
    estimator                            yes   * U
    scaling_method                   min_max   * U
    scaling_method_params                yes   * U
    intercept_based                    False   * d
    reverse_scorecard                  False   * d
    rounding                           False   * d
    verbose                            False   * d
  End options

  Statistics
    Number of records                   7321
    Number of variables                   23
    Target type                       binary

    Number of numerical                   23
    Number of categorical                  0
    Number of selected                    21

  Timing
    Total time                          3.03 sec
    Binning process                     2.52 sec   ( 83.32%)
    Estimator                           0.21 sec   (  6.86%)
    Build scorecard                     0.30 sec   (  9.80%)
      rounding                          0.00 sec   (  0.00%)

Once the scorecard is fitted, we use the ScorecardMonitoring class to ensure that the resulting scorecard is discriminating using train and test data. Furthermore, this class analyzes whether the distribution of train and test data differ significantly. In practice, df_train would be the (expected) data used for scorecard development, whereas df_test would be the (actual) evolved data.

[12]:

monitoring = ScorecardMonitoring(scorecard=scorecard, psi_method="cart",
                                 psi_n_bins=10, verbose=True)

[13]:

monitoring.fit(X_test, y_test, X_train, y_train)

2024-01-15 00:25:29,298 | INFO : Monitoring started.
2024-01-15 00:25:29,301 | INFO : Options: check parameters.
2024-01-15 00:25:29,304 | INFO : System stability analysis started.
2024-01-15 00:25:29,693 | INFO : System stability analysis terminated. Time: 0.3885s
2024-01-15 00:25:29,696 | INFO : Variable analysis started.
2024-01-15 00:25:29,914 | INFO : Variable analysis terminated. Time: 0.2160s
2024-01-15 00:25:29,916 | INFO : Monitoring terminated. Time: 0.6172s

[13]:

ScorecardMonitoring(psi_n_bins=10,
                    scorecard=Scorecard(binning_process=BinningProcess(selection_criteria={'iv': {'max': 1,
                                                                                                  'min': 0.02},
                                                                                           'quality_score': {'min': 0.01}},
                                                                       special_codes=[-9,
                                                                                      -8,
                                                                                      -7],
                                                                       variable_names=['ExternalRiskEstimate',
                                                                                       'MSinceOldestTradeOpen',
                                                                                       'MSinceMostRecentTradeOpen',
                                                                                       'AverageMInFile',
                                                                                       'NumSatisfactoryTrades',
                                                                                       'NumTrades60Ever2DerogPubRec',
                                                                                       'N...
                                                                                       'MSinceMostRecentInqexcl7days',
                                                                                       'NumInqLast6M',
                                                                                       'NumInqLast6Mexcl7days',
                                                                                       'NetFractionRevolvingBurden',
                                                                                       'NetFractionInstallBurden',
                                                                                       'NumRevolvingTradesWBalance',
                                                                                       'NumInstallTradesWBalance',
                                                                                       'NumBank2NatlTradesWHighUtilization',
                                                                                       'PercentTradesWBalance']),
                                        estimator=LogisticRegression(),
                                        scaling_method='min_max',
                                        scaling_method_params={'max': 100,
                                                               'min': 0}),
                    verbose=True)

Similar to other objects in OptBinning, we can print overview information about the options settings, data statistics and CPU times.

[14]:

monitoring.information(print_level=2)

optbinning (Version 0.19.0)
Copyright (c) 2019-2024 Guillermo Navas-Palencia, Apache License 2.0

  Begin options
    scorecard                            yes   * U
    psi_method                          cart   * d
    psi_n_bins                            10   * U
    psi_min_bin_size                    0.05   * d
    show_digits                            2   * d
    verbose                             True   * U
  End options

  Statistics
    Number of records Actual            3138
    Number of records Expected          7321
    Number of scorecard variables         21
    Target type                       binary

  Timing
    Total time                          0.62 sec
    System stability                    0.39 sec   ( 62.94%)
    Variables stability                 0.22 sec   ( 34.99%)

The method psi_table returns the Population Stability Index (PSI) table. The PSI is a divergence measure equivalent to the Information Value (IV), also known as Jeffry’s divergence. This measure assesses whether the actual score distribution has shifted from the expected score distribution. This analysis requires the segmentation of the score with respect to the target, using the options psi_method, psi_n_bins and psi_min_bin_size.

[15]:

monitoring.psi_table()

[15]:

	Bin	Count A	Count E	Count A (%)	Count E (%)	PSI
0	(-inf, 41.29)	236	565	0.075207	0.077175	0.000051
1	[41.29, 46.67)	334	803	0.106437	0.109684	0.000098
2	[46.67, 51.12)	331	807	0.105481	0.110231	0.000209
3	[51.12, 54.76)	310	729	0.098789	0.099577	0.000006
4	[54.76, 57.78)	416	1015	0.132569	0.138642	0.000272
5	[57.78, 60.92)	269	565	0.085723	0.077175	0.000898
6	[60.92, 63.28)	185	417	0.058955	0.056959	0.000069
7	[63.28, 67.20)	298	752	0.094965	0.102718	0.000608
8	[67.20, 72.25)	407	823	0.129700	0.112416	0.002472
9	[72.25, inf)	352	845	0.112173	0.115421	0.000093
Totals		3138	7321	1.000000	1.000000	0.004776

We can plot the PSI table using method psi_plot, where the population distribution and event rate for each bin (Bin ID) are shown.

[16]:

monitoring.psi_plot()

../_images/tutorials_tutorial_scorecard_monitoring_26_0.png

This analysis computes statistical tests to determine if the event rate on train and test data are significantly different using the Chi-square test. The null hypothesis is that actual = expected.

[17]:

monitoring.tests_table()

[17]:

	Bin	Count A	Count E	Event rate A	Event rate E	statistic	p-value
0	(-inf, 41.29)	236	565	0.915254	0.916814	0.005285	0.942048
1	[41.29, 46.67)	334	803	0.820359	0.851806	1.758519	0.184809
2	[46.67, 51.12)	331	807	0.758308	0.768278	0.129913	0.718522
3	[51.12, 54.76)	310	729	0.709677	0.662551	2.207654	0.137327
4	[54.76, 57.78)	416	1015	0.562500	0.565517	0.010927	0.916745
5	[57.78, 60.92)	269	565	0.464684	0.500885	0.955733	0.328264
6	[60.92, 63.28)	185	417	0.464865	0.429257	0.659372	0.416782
7	[63.28, 67.20)	298	752	0.322148	0.293883	0.808999	0.368416
8	[67.20, 72.25)	407	823	0.221130	0.196841	0.986264	0.320657
9	[72.25, inf)	352	845	0.130682	0.114793	0.596356	0.439972

Finally, the system_stability_report method summarizes the findings encountered throughout the analysis. Moreover, it returns a comparison with the performance of the provided scorecard on both train and test data, to identify if the developed scorecard suffers a deterioration in predictiveness on the actual/evolved data.

[18]:

monitoring.system_stability_report()

-----------------------------------
Monitoring: System Stability Report
-----------------------------------

  Population Stability Index (PSI)


    PSI total:      0.0048 (No significant change)

         PSI bin  Count  Count (%)
    [0.00, 0.10)     10        1.0
    [0.10, 0.25)      0        0.0
    [0.25, Inf+)      0        0.0

  Significance tests (H0: actual == expected)

     p-value bin  Count  Count (%)
    [0.00, 0.05)      0        0.0
    [0.05, 0.10)      0        0.0
    [0.10, 0.50)      7        0.7
    [0.50, 1.00)      3        0.3

  Target analysis

               Metric  Actual Actual (%)  Expected Expected (%)
    Number of records    3138          -      7321            -
        Event records    1638   0.521989      3821     0.521923
    Non-event records    1500   0.478011      3500     0.478077

  Performance metrics

                 Metric   Actual  Expected  Diff A - E
     True positive rate 0.755189  0.770741   -0.015551
     True negative rate 0.696000  0.681429    0.014571
    False positive rate 0.304000  0.318571   -0.014571
    False negative rate 0.244811  0.229259    0.015551
      Balanced accuracy 0.725595  0.726085   -0.000490
     Discriminant power 1.077740  1.087685   -0.009945
                   Gini 0.587042  0.604119   -0.017077

The ScorecardMonitoring also implements functionalities to perform the characteristic stability report. The psi_variable_tablemethod returns the PSI using the optimal bins incorporated in the scorecard at a characteristic level.

[19]:

monitoring.psi_variable_table(style="detailed")

[19]:

	Variable	Bin	Count A	Count E	Count A (%)	Count E (%)	PSI
0	ExternalRiskEstimate	(-inf, 59.50)	494	1185	0.157425	0.161863	0.000123
1	ExternalRiskEstimate	[59.50, 63.50)	326	771	0.103888	0.105313	0.000019
2	ExternalRiskEstimate	[63.50, 65.50)	193	488	0.061504	0.066658	0.000415
3	ExternalRiskEstimate	[65.50, 67.50)	205	447	0.065328	0.061057	0.000289
4	ExternalRiskEstimate	[67.50, 70.50)	302	736	0.096240	0.100533	0.000187
...	...	...	...	...	...	...	...
5	PercentTradesWBalance	[67.50, 73.50)	216	516	0.068834	0.070482	0.000039
6	PercentTradesWBalance	[73.50, 75.50)	178	382	0.056724	0.052179	0.000380
7	PercentTradesWBalance	[75.50, 84.50)	328	851	0.104525	0.116241	0.001245
8	PercentTradesWBalance	[84.50, 89.50)	199	434	0.063416	0.059282	0.000279
9	PercentTradesWBalance	[89.50, inf)	481	1134	0.153282	0.154897	0.000017

135 rows × 7 columns

[20]:

monitoring.psi_variable_table(style="summary")

[20]:

	Variable	PSI
0	AverageMInFile	0.004087
1	ExternalRiskEstimate	0.003432
2	MSinceMostRecentDelq	0.001042
3	MSinceMostRecentInqexcl7days	0.001249
4	MSinceMostRecentTradeOpen	0.000180
5	MSinceOldestTradeOpen	0.002839
6	MaxDelq2PublicRecLast12M	0.000514
7	MaxDelqEver	0.000379
8	NetFractionInstallBurden	0.003289
9	NetFractionRevolvingBurden	0.004657
10	NumBank2NatlTradesWHighUtilization	0.000867
11	NumInqLast6M	0.001527
12	NumInqLast6Mexcl7days	0.001399
13	NumRevolvingTradesWBalance	0.001579
14	NumTotalTrades	0.001619
15	NumTrades60Ever2DerogPubRec	0.001896
16	NumTrades90Ever2DerogPubRec	0.000495
17	NumTradesOpeninLast12M	0.001805
18	PercentInstallTrades	0.001951
19	PercentTradesNeverDelq	0.002862
20	PercentTradesWBalance	0.003316

Continuous target¶

Similar monitoring is available for a scorecard with a continuous target.

[21]:

data = fetch_california_housing()

target = "target"
variable_names = data.feature_names
X = pd.DataFrame(data.data, columns=variable_names)
y = data.target

[22]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

[23]:

binning_process = BinningProcess(variable_names)

[24]:

estimator = HuberRegressor(max_iter=200)

[25]:

scorecard = Scorecard(binning_process=binning_process,
                      estimator=estimator, scaling_method="min_max",
                      scaling_method_params={"min": 0, "max": 100},
                      reverse_scorecard=True)

[26]:

scorecard.fit(X_train, y_train)

[26]:

Scorecard(binning_process=BinningProcess(variable_names=['MedInc', 'HouseAge',
                                                         'AveRooms',
                                                         'AveBedrms',
                                                         'Population',
                                                         'AveOccup', 'Latitude',
                                                         'Longitude']),
          estimator=HuberRegressor(max_iter=200), reverse_scorecard=True,
          scaling_method='min_max',
          scaling_method_params={'max': 100, 'min': 0})

[27]:

monitoring = ScorecardMonitoring(scorecard=scorecard, psi_method="cart",
                                 psi_n_bins=10)

monitoring.fit(X_test, y_test, X_train, y_train)

[27]:

ScorecardMonitoring(psi_n_bins=10,
                    scorecard=Scorecard(binning_process=BinningProcess(variable_names=['MedInc',
                                                                                       'HouseAge',
                                                                                       'AveRooms',
                                                                                       'AveBedrms',
                                                                                       'Population',
                                                                                       'AveOccup',
                                                                                       'Latitude',
                                                                                       'Longitude']),
                                        estimator=HuberRegressor(max_iter=200),
                                        reverse_scorecard=True,
                                        scaling_method='min_max',
                                        scaling_method_params={'max': 100,
                                                               'min': 0}))

[28]:

monitoring.information(print_level=2)

optbinning (Version 0.19.0)
Copyright (c) 2019-2024 Guillermo Navas-Palencia, Apache License 2.0

  Begin options
    scorecard                            yes   * U
    psi_method                          cart   * d
    psi_n_bins                            10   * U
    psi_min_bin_size                    0.05   * d
    show_digits                            2   * d
    verbose                            False   * d
  End options

  Statistics
    Number of records Actual            6192
    Number of records Expected         14448
    Number of scorecard variables          8
    Target type                   continuous

  Timing
    Total time                          0.22 sec
    System stability                    0.13 sec   ( 59.21%)
    Variables stability                 0.09 sec   ( 40.35%)

[29]:

monitoring.psi_table()

[29]:

	Bin	Count A	Count E	Count A (%)	Count E (%)	PSI
0	(-inf, 49.51)	318	725	0.051357	0.050180	0.000027
1	[49.51, 51.67)	458	1157	0.073966	0.080080	0.000486
2	[51.67, 53.68)	527	1171	0.085110	0.081049	0.000198
3	[53.68, 56.56)	861	2022	0.139050	0.139950	0.000006
4	[56.56, 59.35)	907	2093	0.146479	0.144864	0.000018
5	[59.35, 60.85)	516	1162	0.083333	0.080426	0.000103
6	[60.85, 63.37)	830	1911	0.134044	0.132267	0.000024
7	[63.37, 66.13)	665	1531	0.107397	0.105966	0.000019
8	[66.13, 70.97)	586	1377	0.094638	0.095307	0.000005
9	[70.97, inf)	524	1299	0.084625	0.089909	0.000320
Totals		6192	14448	1.000000	1.000000	0.001206

[30]:

monitoring.psi_plot()

../_images/tutorials_tutorial_scorecard_monitoring_45_0.png

This analysis computes statistical tests to determine if the mean on train and test data are significantly different using the Student’s t-test. The null hypothesis is that actual = expected.

[31]:

monitoring.tests_table()

[31]:

	Bin	Count A	Count E	Mean A	Mean E	Std A	Std E	statistic	p-value
0	(-inf, 49.51)	318	725	0.800129	0.794417	0.335265	0.389576	0.240789	0.809789
1	[49.51, 51.67)	458	1157	1.035358	1.060722	0.481755	0.441926	-0.975857	0.329439
2	[51.67, 53.68)	527	1171	1.253723	1.235909	0.513648	0.473194	0.677242	0.498419
3	[53.68, 56.56)	861	2022	1.416871	1.403359	0.597450	0.576907	0.561459	0.574565
4	[56.56, 59.35)	907	2093	1.659644	1.686652	0.629067	0.680155	-1.053431	0.292281
5	[59.35, 60.85)	516	1162	1.953297	1.909709	0.697145	0.668803	1.196619	0.231753
6	[60.85, 63.37)	830	1911	2.311459	2.237499	0.783756	0.772066	2.280284	0.022726
7	[63.37, 66.13)	665	1531	2.635100	2.653461	0.848958	0.855882	-0.464558	0.642328
8	[66.13, 70.97)	586	1377	3.183081	3.145512	0.920088	0.910585	0.830408	0.406490
9	[70.97, inf)	524	1299	4.084080	4.148884	0.927028	0.879090	-1.370776	0.170778

[32]:

monitoring.system_stability_report()

-----------------------------------
Monitoring: System Stability Report
-----------------------------------

  Population Stability Index (PSI)


    PSI total:      0.0012 (No significant change)

         PSI bin  Count  Count (%)
    [0.00, 0.10)     10        1.0
    [0.10, 0.25)      0        0.0
    [0.25, Inf+)      0        0.0

  Significance tests (H0: actual == expected)

     p-value bin  Count  Count (%)
    [0.00, 0.05)      1        0.1
    [0.05, 0.10)      0        0.0
    [0.10, 0.50)      6        0.6
    [0.50, 1.00)      3        0.3

  Target analysis

    Metric   Actual  Expected
      Mean 2.066968  2.069240
       Std 1.145661  1.157452
       p25 1.202750  1.193000
    Median 1.810000  1.793000
       p75 2.650500  2.646000

  Performance metrics

                   Metric    Actual  Expected  Diff A - E
      Mean absolute error  0.520695  0.516443    0.004253
       Mean squared error  0.509314  0.502051    0.007263
    Median absolute error  0.392306  0.382863    0.009443
       Explained variance  0.616266  0.628663   -0.012397
                      R^2  0.611963  0.625250   -0.013287
                      MPE -0.080802 -0.082159    0.001358
                     MAPE  0.300370  0.297225    0.003145
                    SMAPE  0.136579  0.135519    0.001060
                    MdAPE  0.213391  0.210681    0.002710
                   SMdAPE  0.107431  0.105786    0.001646

[33]:

monitoring.psi_variable_table(style="detailed")

[33]:

	Variable	Bin	Count A	Count E	Count A (%)	Count E (%)	PSI
0	MedInc	(-inf, 1.82)	534	1253	0.086240	0.086725	0.000003
1	MedInc	[1.82, 2.24)	533	1252	0.086079	0.086656	0.000004
2	MedInc	[2.24, 2.57)	511	1112	0.082526	0.076966	0.000388
3	MedInc	[2.57, 2.83)	409	997	0.066053	0.069006	0.000129
4	MedInc	[2.83, 3.07)	383	876	0.061854	0.060631	0.000024
...	...	...	...	...	...	...	...
2	Longitude	[-122.12, -121.45)	815	1923	0.131621	0.133098	0.000016
3	Longitude	[-121.45, -120.69)	485	1180	0.078327	0.081672	0.000140
4	Longitude	[-120.69, -119.76)	303	727	0.048934	0.050318	0.000039
5	Longitude	[-119.76, -118.91)	363	858	0.058624	0.059385	0.000010
6	Longitude	[-118.91, inf)	3297	7542	0.532461	0.522010	0.000207

72 rows × 7 columns

[34]:

monitoring.psi_variable_table(style="summary")

[34]:

	Variable	PSI
0	AveBedrms	0.001911
1	AveOccup	0.003518
2	AveRooms	0.001540
3	HouseAge	0.001855
4	Latitude	0.003628
5	Longitude	0.000635
6	MedInc	0.001032
7	Population	0.000904