# Tutorial: optimal binning with binary target under uncertainty¶

The drawback of performing optimal binning given only expected event rates is that variability of event rates in different periods is not taken into account. In this tutorial, we show how scenario-based stochastic programming allows incorporating uncertainty without much difficulty.

[1]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from scipy import stats

[2]:

from optbinning import OptimalBinning
from optbinning.binning.uncertainty import SBOptimalBinning


## Scenario generation¶

We generate three scenarios, all equally likely, aiming to represent three economic scenarios severity using the customer’s score variable, for instance.

Scenario 0 - Normal (Realistic): A low customer’ score has a higher event rate (default rate, churn, etc) than a high customer’s score. The population corresponding to non-event and event are reasonably separated.

[3]:

N0 = int(1e5)

xe = stats.beta(a=4, b=15).rvs(size=N0, random_state=42)
ye = stats.bernoulli(p=0.7).rvs(size=N0, random_state=42)
xn = stats.beta(a=6, b=8).rvs(size=N0, random_state=42)
yn = stats.bernoulli(p=0.2).rvs(size=N0, random_state=42)

x0 = np.concatenate((xn, xe), axis=0)
y0 = np.concatenate((yn, ye), axis=0)

[4]:

def plot_distribution(x, y):
plt.hist(x[y == 0], label="n_nonevent", color="b", alpha=0.5)
plt.hist(x[y == 1], label="n_event", color="r", alpha=0.5)
plt.legend()
plt.show()

[5]:

plot_distribution(x0, y0)


Scenario 1: Good (Optimistic): A low customer’ score has a much higher event rate (default rate, churn, etc) than a high customer’s score. The population corresponding to non-event and event rate are very well separated, showing minimum overlap regions.

[6]:

N1 = int(5e4)

xe = stats.beta(a=25, b=50).rvs(size=N1, random_state=42)
ye = stats.bernoulli(p=0.9).rvs(size=N1, random_state=42)
xn = stats.beta(a=22, b=25).rvs(size=N1, random_state=42)
yn = stats.bernoulli(p=0.05).rvs(size=N1, random_state=42)

x1 = np.concatenate((xn, xe), axis=0)
y1 = np.concatenate((yn, ye), axis=0)

[7]:

plot_distribution(x1, y1)


Scenario 2: Bad (Pessimistic): Customer’s behavior cannot be accurately segmented, and a general increase in event rates is exhibited. The populations corresponding to non-event and event are practically overlapped.

[8]:

N2 = int(5e4)

xe = stats.beta(a=4, b=6).rvs(size=N2, random_state=42)
ye = stats.bernoulli(p=0.7).rvs(size=N2, random_state=42)
xn = stats.beta(a=8, b=10).rvs(size=N2, random_state=42)
yn = stats.bernoulli(p=0.4).rvs(size=N2, random_state=42)

x2 = np.concatenate((xn, xe), axis=0)
y2 = np.concatenate((yn, ye), axis=0)

[9]:

plot_distribution(x2, y2)


## Scenario-based stochastic optimal binning¶

Prepare scenarios data and instantiate an SBOptimalBinning object class. We set a descending monotonicity constraint with respect to event rate and a minimum bin size.

[10]:

X = [x0, x1, x2]
Y = [y0, y1, y2]

[11]:

sboptb = SBOptimalBinning(monotonic_trend="descending", min_bin_size=0.05)
sboptb.fit(X, Y)

[11]:

SBOptimalBinning(min_bin_size=0.05, monotonic_trend='descending')

[12]:

sboptb.status

[12]:

'OPTIMAL'


We obtain “only” three splits guaranteeing feasibility for each scenario.

[13]:

sboptb.splits

[13]:

array([0.28578988, 0.36384453, 0.43260857])

[14]:

sboptb.information(print_level=2)

optbinning (Version 0.19.0)
Copyright (c) 2019-2024 Guillermo Navas-Palencia, Apache License 2.0

Begin options
name                                       * d
prebinning_method                   cart   * d
max_n_prebins                         20   * d
min_prebin_size                     0.05   * d
min_n_bins                            no   * d
max_n_bins                            no   * d
min_bin_size                        0.05   * U
max_bin_size                          no   * d
monotonic_trend               descending   * U
min_event_rate_diff                    0   * d
max_pvalue                            no   * d
max_pvalue_policy            consecutive   * d
class_weight                          no   * d
user_splits                           no   * d
user_splits_fixed                     no   * d
special_codes                         no   * d
split_digits                          no   * d
time_limit                           100   * d
verbose                            False   * d
End options

Name    : UNKNOWN
Status  : OPTIMAL

Pre-binning statistics
Number of pre-bins                    16
Number of refinements                  1

Solver statistics
Type                                  cp
Number of booleans                    40
Number of branches                    91
Number of conflicts                    1
Objective value                  2736534
Best objective bound             2736534

Timing
Total time                          1.21 sec
Pre-processing                      0.01 sec   (  0.90%)
Pre-binning                         0.70 sec   ( 58.18%)
Solver                              0.49 sec   ( 40.72%)
model generation                  0.44 sec   ( 90.22%)
optimizer                         0.05 sec   (  9.78%)
Post-processing                     0.00 sec   (  0.10%)



### The binning table¶

As other optimal binning algorithms in OptBinning, SBOptimalBinning also returns a binning table displaying the binned data considering all scenarios.

[15]:

sboptb.binning_table.build()

[15]:

Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.29) 119678 0.299195 42005 77673 0.649017 -0.688603 0.138209 0.016943
1 [0.29, 0.36) 79729 0.199323 32837 46892 0.588142 -0.430175 0.036613 0.004542
2 [0.36, 0.43) 68378 0.170945 39045 29333 0.428983 0.212118 0.007633 0.000952
3 [0.43, inf) 132215 0.330537 93498 38717 0.292834 0.807778 0.201811 0.024562
4 Special 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
5 Missing 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 400000 1.000000 207385 192615 0.481538 0.384266 0.046999
[16]:

sboptb.binning_table.plot(metric="event_rate")

[17]:

sboptb.binning_table.analysis()

---------------------------------------------
OptimalBinning: Binary Binning Table Analysis
---------------------------------------------

General metrics

Gini index               0.33117510
IV (Jeffrey)             0.38426602
JS (Jensen-Shannon)      0.04699884
Hellinger                0.04750871
Triangular               0.18408357
KS                       0.28582022
HHI                      0.26772434
HHI (normalized)         0.12126921
Cramer's V               0.30285954
Quality score            0.87798527

Monotonic trend            descending

Significance tests

Bin A  Bin B  t-statistic       p-value  P[A > B]     P[B > A]
0      1   756.303469 1.709260e-166       1.0 1.110223e-16
1      2  3732.973381  0.000000e+00       1.0 1.110223e-16
2      3  3726.998391  0.000000e+00       1.0 1.110223e-16



## Expected value solution (EVS)¶

The expected value solution is calculated with the normal (expected) scenario.

[18]:

optb = OptimalBinning(monotonic_trend="descending", min_bin_size=0.05)
optb.fit(x0, y0)

[18]:

OptimalBinning(min_bin_size=0.05, monotonic_trend='descending')

[19]:

optb.binning_table.build()

[19]:

Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.10) 10255 0.051275 3061 7194 0.701511 -1.054853 0.054945 0.006566
1 [0.10, 0.14) 12519 0.062595 3911 8608 0.687595 -0.989246 0.059422 0.007139
2 [0.14, 0.18) 18333 0.091665 6065 12268 0.669176 -0.904807 0.073418 0.008877
3 [0.18, 0.20) 13631 0.068155 4884 8747 0.641699 -0.783094 0.041320 0.005037
4 [0.20, 0.23) 14606 0.073030 5684 8922 0.610845 -0.651212 0.030891 0.003795
5 [0.23, 0.27) 17995 0.089975 8043 9952 0.553043 -0.413319 0.015470 0.001920
6 [0.27, 0.30) 13047 0.065235 6672 6375 0.488618 -0.154812 0.001572 0.000196
7 [0.30, 0.35) 18825 0.094125 11158 7667 0.407278 0.174884 0.002847 0.000355
8 [0.35, 0.39) 16401 0.082005 10903 5498 0.335223 0.484306 0.018430 0.002282
9 [0.39, 0.44) 18759 0.093795 13688 5071 0.270324 0.792634 0.053994 0.006578
10 [0.44, 0.49) 14549 0.072745 11273 3276 0.225170 1.035440 0.068446 0.008193
11 [0.49, 0.54) 11019 0.055095 8697 2322 0.210727 1.120202 0.059684 0.007093
12 [0.54, 0.60) 10030 0.050150 7957 2073 0.206680 1.144708 0.056454 0.006695
13 [0.60, inf) 10031 0.050155 7988 2043 0.203669 1.163174 0.058080 0.006877
14 Special 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
15 Missing 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 200000 1.000000 109984 90016 0.450080 0.594974 0.071603
[20]:

optb.binning_table.plot(metric="event_rate")

[21]:

optb.binning_table.analysis()

---------------------------------------------
OptimalBinning: Binary Binning Table Analysis
---------------------------------------------

General metrics

Gini index               0.42141055
IV (Jeffrey)             0.59497411
JS (Jensen-Shannon)      0.07160267
Hellinger                0.07295186
Triangular               0.27638899
KS                       0.34108533
HHI                      0.07501900
HHI (normalized)         0.01335360
Cramer's V               0.36927482
Quality score            0.16335319

Monotonic trend            descending

Significance tests

Bin A  Bin B  t-statistic      p-value  P[A > B]     P[B > A]
0      1     5.139745 2.338408e-02  0.988706 1.129409e-02
1      2    11.534993 6.829832e-04  0.999721 2.787731e-04
2      3    26.208899 3.064073e-07  1.000000 7.445353e-09
3      4    28.661681 8.619251e-08  1.000000 4.436704e-09
4      5   110.500800 7.611468e-26  1.000000 1.110223e-16
5      6   125.906119 3.223792e-29  1.000000 1.110223e-16
6      7   206.865709 6.632897e-47  1.000000 1.110223e-16
7      8   194.419542 3.449032e-44  1.000000 1.110223e-16
8      9   175.309976 5.122903e-40  1.000000 1.110223e-16
9     10    88.957203 4.034468e-21  1.000000 1.110223e-16
10     11     7.648694 5.681344e-03  0.997558 2.442113e-03
11     12     0.520543 4.706103e-01  0.764881 2.351195e-01
12     13     0.278879 5.974371e-01  0.701329 2.986709e-01



## Scenario analysis¶

### Scenario 0 - Normal (Realistic)¶

[22]:

bt0 = sboptb.binning_table_scenario(scenario_id=0)
bt0.build()

[22]:

Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.29) 93851 0.469255 34903 58948 0.628102 -0.724430 0.244506 0.029912
1 [0.29, 0.36) 32141 0.160705 18945 13196 0.410566 0.161279 0.004138 0.000517
2 [0.36, 0.43) 24488 0.122440 17289 7199 0.293981 0.675781 0.052184 0.006402
3 [0.43, inf) 49520 0.247600 38847 10673 0.215529 1.091566 0.256123 0.030515
4 Special 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
5 Missing 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 200000 1.000000 109984 90016 0.450080 0.556952 0.067345
[23]:

bt0.plot(metric="event_rate")

[24]:

optb0 = OptimalBinning(monotonic_trend="descending", min_bin_size=0.05)
optb0.fit(x0, y0)

[24]:

OptimalBinning(min_bin_size=0.05, monotonic_trend='descending')

[25]:

optb0.binning_table.build()

[25]:

Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.10) 10255 0.051275 3061 7194 0.701511 -1.054853 0.054945 0.006566
1 [0.10, 0.14) 12519 0.062595 3911 8608 0.687595 -0.989246 0.059422 0.007139
2 [0.14, 0.18) 18333 0.091665 6065 12268 0.669176 -0.904807 0.073418 0.008877
3 [0.18, 0.20) 13631 0.068155 4884 8747 0.641699 -0.783094 0.041320 0.005037
4 [0.20, 0.23) 14606 0.073030 5684 8922 0.610845 -0.651212 0.030891 0.003795
5 [0.23, 0.27) 17995 0.089975 8043 9952 0.553043 -0.413319 0.015470 0.001920
6 [0.27, 0.30) 13047 0.065235 6672 6375 0.488618 -0.154812 0.001572 0.000196
7 [0.30, 0.35) 18825 0.094125 11158 7667 0.407278 0.174884 0.002847 0.000355
8 [0.35, 0.39) 16401 0.082005 10903 5498 0.335223 0.484306 0.018430 0.002282
9 [0.39, 0.44) 18759 0.093795 13688 5071 0.270324 0.792634 0.053994 0.006578
10 [0.44, 0.49) 14549 0.072745 11273 3276 0.225170 1.035440 0.068446 0.008193
11 [0.49, 0.54) 11019 0.055095 8697 2322 0.210727 1.120202 0.059684 0.007093
12 [0.54, 0.60) 10030 0.050150 7957 2073 0.206680 1.144708 0.056454 0.006695
13 [0.60, inf) 10031 0.050155 7988 2043 0.203669 1.163174 0.058080 0.006877
14 Special 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
15 Missing 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 200000 1.000000 109984 90016 0.450080 0.594974 0.071603
[26]:

optb0.binning_table.plot(metric="event_rate")


Apply expected value solution to scenario 0.

[27]:

evs_optb0 = OptimalBinning(user_splits=optb.splits)
evs_optb0.fit(x0, y0)

[27]:

OptimalBinning(user_splits=array([0.1008646 , 0.13640077, 0.17635711, 0.20390539, 0.23334569,
0.27135116, 0.30051835, 0.34623086, 0.38964605, 0.44464479,
0.49356021, 0.53990114, 0.59801421]))

[28]:

evs_optb0.binning_table.build()

[28]:

Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.10) 10255 0.051275 3061 7194 0.701511 -1.054853 0.054945 0.006566
1 [0.10, 0.14) 12519 0.062595 3911 8608 0.687595 -0.989246 0.059422 0.007139
2 [0.14, 0.18) 18333 0.091665 6065 12268 0.669176 -0.904807 0.073418 0.008877
3 [0.18, 0.20) 13631 0.068155 4884 8747 0.641699 -0.783094 0.041320 0.005037
4 [0.20, 0.23) 14606 0.073030 5684 8922 0.610845 -0.651212 0.030891 0.003795
5 [0.23, 0.27) 17995 0.089975 8043 9952 0.553043 -0.413319 0.015470 0.001920
6 [0.27, 0.30) 13047 0.065235 6672 6375 0.488618 -0.154812 0.001572 0.000196
7 [0.30, 0.35) 18825 0.094125 11158 7667 0.407278 0.174884 0.002847 0.000355
8 [0.35, 0.39) 16401 0.082005 10903 5498 0.335223 0.484306 0.018430 0.002282
9 [0.39, 0.44) 18759 0.093795 13688 5071 0.270324 0.792634 0.053994 0.006578
10 [0.44, 0.49) 14549 0.072745 11273 3276 0.225170 1.035440 0.068446 0.008193
11 [0.49, 0.54) 11019 0.055095 8697 2322 0.210727 1.120202 0.059684 0.007093
12 [0.54, 0.60) 10030 0.050150 7957 2073 0.206680 1.144708 0.056454 0.006695
13 [0.60, inf) 10031 0.050155 7988 2043 0.203669 1.163174 0.058080 0.006877
14 Special 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
15 Missing 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 200000 1.000000 109984 90016 0.450080 0.594974 0.071603
[29]:

evs_optb0.binning_table.plot(metric="event_rate")


The expected value solution applied to scenarion 0 does not satisfy the min_bin_size constraint, hence the solution is not feasible.

[30]:

EVS_0 = 0.594974


Scenario 1: Good (Optimistic)

[31]:

bt1 = sboptb.binning_table_scenario(scenario_id=1)
bt1.build()

[31]:

Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.29) 9840 0.09840 1126 8714 0.885569 -2.146624 0.347828 0.036679
1 [0.29, 0.36) 29807 0.29807 5902 23905 0.801993 -1.499161 0.586072 0.067087
2 [0.36, 0.43) 24262 0.24262 12658 11604 0.478279 -0.013425 0.000044 0.000005
3 [0.43, inf) 36091 0.36091 32821 3270 0.090604 2.205914 1.226988 0.128301
4 Special 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
5 Missing 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 100000 1.00000 52507 47493 0.474930 2.160931 0.232072
[32]:

bt1.plot(metric="event_rate")

[33]:

optb1 = OptimalBinning(monotonic_trend="descending", min_bin_size=0.05)
optb1.fit(x1, y1)

[33]:

OptimalBinning(min_bin_size=0.05, monotonic_trend='descending')

[34]:

optb1.binning_table.build()

[34]:

Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.28) 8209 0.08209 908 7301 0.889390 -2.184886 0.298095 0.031264
1 [0.28, 0.30) 5545 0.05545 738 4807 0.866907 -1.974249 0.172075 0.018581
2 [0.30, 0.31) 5186 0.05186 777 4409 0.850174 -1.836327 0.143301 0.015756
3 [0.31, 0.33) 5837 0.05837 956 4881 0.836217 -1.730712 0.146359 0.016307
4 [0.33, 0.34) 5176 0.05176 1077 4099 0.791924 -1.436928 0.094544 0.010896
5 [0.34, 0.36) 7055 0.07055 1760 5295 0.750532 -1.201813 0.093706 0.011056
6 [0.36, 0.38) 8537 0.08537 2882 5655 0.662411 -0.774420 0.049704 0.006062
7 [0.38, 0.40) 6189 0.06189 2802 3387 0.547261 -0.289975 0.005205 0.000648
8 [0.40, 0.41) 5058 0.05058 2862 2196 0.434164 0.164519 0.001360 0.000170
9 [0.41, 0.44) 8246 0.08246 5781 2465 0.298933 0.752021 0.043766 0.005345
10 [0.44, 0.45) 5253 0.05253 4321 932 0.177422 1.433545 0.089840 0.010358
11 [0.45, 0.47) 5009 0.05009 4420 589 0.117588 1.915105 0.137461 0.014960
12 [0.47, 0.49) 5204 0.05204 4780 424 0.081476 2.322098 0.190662 0.019603
13 [0.49, 0.53) 8825 0.08825 8283 542 0.061416 2.626330 0.384332 0.037733
14 [0.53, 0.56) 5061 0.05061 4807 254 0.050188 2.840130 0.244824 0.023237
15 [0.56, inf) 5610 0.05610 5353 257 0.045811 2.935972 0.283430 0.026488
16 Special 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
17 Missing 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 100000 1.00000 52507 47493 0.474930 2.378665 0.248465
[35]:

optb1.binning_table.plot(metric="event_rate")


Apply expected value solution to scenario 1.

[36]:

evs_optb1 = OptimalBinning(user_splits=optb.splits)
evs_optb1.fit(x1, y1)

[36]:

OptimalBinning(user_splits=array([0.1008646 , 0.13640077, 0.17635711, 0.20390539, 0.23334569,
0.27135116, 0.30051835, 0.34623086, 0.38964605, 0.44464479,
0.49356021, 0.53990114, 0.59801421]))

[37]:

evs_optb1.binning_table.build()

[37]:

Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.20) 247 0.00247 26 221 0.894737 -2.240430 0.009316 0.000969
1 [0.20, 0.23) 1092 0.01092 118 974 0.891941 -2.211091 0.040377 0.004219
2 [0.23, 0.27) 5037 0.05037 566 4471 0.887632 -2.167137 0.180654 0.018995
3 [0.27, 0.30) 7918 0.07918 1019 6899 0.871306 -2.012919 0.253339 0.027214
4 [0.30, 0.35) 18126 0.18126 3313 14813 0.817224 -1.598015 0.397590 0.045005
5 [0.35, 0.39) 17091 0.17091 5742 11349 0.664034 -0.781686 0.101310 0.012351
6 [0.39, 0.44) 18095 0.18095 11857 6238 0.344736 0.541895 0.051194 0.006322
7 [0.44, 0.49) 14295 0.14295 12739 1556 0.108849 2.002186 0.420164 0.045199
8 [0.49, 0.54) 10111 0.10111 9523 588 0.058154 2.684374 0.453620 0.044133
9 [0.54, 0.60) 6215 0.06215 5918 297 0.047788 2.891658 0.307832 0.028976
10 [0.60, inf) 1773 0.01773 1686 87 0.049069 2.863842 0.086712 0.008199
11 Special 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
12 Missing 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 100000 1.00000 52507 47493 0.474930 2.302108 0.241582
[38]:

evs_optb1.binning_table.plot(metric="event_rate")

[39]:

evs_optb1.binning_table.analysis()

---------------------------------------------
OptimalBinning: Binary Binning Table Analysis
---------------------------------------------

General metrics

Gini index               0.72566718
IV (Jeffrey)             2.30210757
JS (Jensen-Shannon)      0.24158211
Hellinger                0.26182242
Triangular               0.84830395
KS                       0.61004329
HHI                      0.13857518
HHI (normalized)         0.06678978
Cramer's V               0.64902999
Quality score            0.00000000

Monotonic trend                valley

Significance tests

Bin A  Bin B  t-statistic       p-value  P[A > B]     P[B > A]
0      1     0.016401  8.980961e-01  0.566231 4.337689e-01
1      2     0.168135  6.817748e-01  0.666387 3.336129e-01
2      3     7.641448  5.704212e-03  0.997457 2.543322e-03
3      4   116.236493  4.218674e-27  1.000000 1.110223e-16
4      5  1080.747496 5.050568e-237  1.000000 1.110223e-16
5      6  3584.325798  0.000000e+00  1.000000 1.110223e-16
6      7  2431.847481  0.000000e+00  1.000000 1.110223e-16
7      8   189.938108  3.279750e-43  1.000000 1.110223e-16
8      9     8.068486  4.504174e-03  0.998346 1.654368e-03
9     10     0.049526  8.238907e-01  0.420232 5.797684e-01



The expected value solution applied to scenario 1 satisfies neither the min_bin_size constraint nor the monotonicity constraint, hence the solution is not feasible.

[40]:

EVS_1 = -np.inf


Scenario 2: Bad (Pessimistic)

[41]:

bt2 = sboptb.binning_table_scenario(scenario_id=2)
bt2.build()

[41]:

Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.29) 15987 0.15987 5976 10011 0.626196 -0.310979 1.509941e-02 1.879858e-03
1 [0.29, 0.36) 17781 0.17781 7990 9791 0.550644 0.001682 5.028570e-07 6.285711e-08
2 [0.36, 0.43) 19628 0.19628 9098 10530 0.536479 0.058781 6.800268e-04 8.499112e-05
3 [0.43, inf) 46604 0.46604 21830 24774 0.531585 0.078445 2.877876e-03 3.596423e-04
4 Special 0 0.00000 0 0 0.000000 0.000000 0.000000e+00 0.000000e+00
5 Missing 0 0.00000 0 0 0.000000 0.000000 0.000000e+00 0.000000e+00
Totals 100000 1.00000 44894 55106 0.551060 1.865782e-02 2.324554e-03
[42]:

bt2.plot(metric="event_rate")

[43]:

optb2 = OptimalBinning(monotonic_trend="descending", min_bin_size=0.05)
optb2.fit(x2, y2)

[43]:

OptimalBinning(min_bin_size=0.05, monotonic_trend='descending')

[44]:

optb2.binning_table.build()

[44]:

Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.23) 7556 0.07556 2543 5013 0.663446 -0.473736 0.016261 0.002014
1 [0.23, 0.29) 9657 0.09657 3918 5739 0.594284 -0.176749 0.002982 0.000372
2 [0.29, 0.33) 8559 0.08559 3801 4758 0.555906 -0.019609 0.000033 0.000004
3 [0.33, 0.39) 15848 0.15848 7234 8614 0.543539 0.030358 0.000146 0.000018
4 [0.39, inf) 58380 0.58380 27398 30982 0.530695 0.082018 0.003941 0.000493
5 Special 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
6 Missing 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 100000 1.00000 44894 55106 0.551060 0.023364 0.002901
[45]:

optb2.binning_table.plot(metric="event_rate")


Apply expected value solution to scenario 2.

[46]:

evs_optb2 = OptimalBinning(user_splits=optb.splits)
evs_optb2.fit(x2, y2)

[46]:

OptimalBinning(user_splits=array([0.1008646 , 0.13640077, 0.17635711, 0.20390539, 0.23334569,
0.27135116, 0.30051835, 0.34623086, 0.38964605, 0.44464479,
0.49356021, 0.53990114, 0.59801421]))

[47]:

evs_optb2.binning_table.build()

[47]:

Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.14) 1292 0.01292 405 887 0.686533 -0.579003 0.004096 5.050214e-04
1 [0.14, 0.18) 1850 0.01850 598 1252 0.676757 -0.533952 0.005019 6.200181e-04
2 [0.18, 0.20) 2002 0.02002 709 1293 0.645854 -0.395910 0.003037 3.771741e-04
3 [0.20, 0.23) 2944 0.02944 1049 1895 0.643682 -0.386427 0.004259 5.291176e-04
4 [0.23, 0.27) 5326 0.05326 2134 3192 0.599324 -0.197695 0.002054 2.563524e-04
5 [0.27, 0.30) 5390 0.05390 2291 3099 0.574954 -0.097137 0.000506 6.318381e-05
6 [0.30, 0.35) 10414 0.10414 4667 5747 0.551853 -0.003207 0.000001 1.338664e-07
7 [0.35, 0.39) 11782 0.11782 5375 6407 0.543796 0.029322 0.000101 1.267992e-05
8 [0.39, 0.44) 15901 0.15901 7509 8392 0.527766 0.093778 0.001404 1.754450e-04
9 [0.44, 0.54) 23757 0.23757 11416 12341 0.519468 0.127043 0.003854 4.814509e-04
10 [0.54, 0.60) 9639 0.09639 4529 5110 0.530138 0.084256 0.000687 8.582857e-05
11 [0.60, inf) 9703 0.09703 4212 5491 0.565907 -0.060218 0.000351 4.382723e-05
12 Special 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000e+00
13 Missing 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000e+00
Totals 100000 1.00000 44894 55106 0.551060 0.025370 3.150233e-03
[48]:

evs_optb2.binning_table.plot(metric="event_rate")

[49]:

evs_optb2.binning_table.analysis()

---------------------------------------------
OptimalBinning: Binary Binning Table Analysis
---------------------------------------------

General metrics

Gini index               0.07657686
IV (Jeffrey)             0.02536981
JS (Jensen-Shannon)      0.00315023
Hellinger                0.00316066
Triangular               0.01251904
KS                       0.05109803
HHI                      0.13267476
HHI (normalized)         0.06595743
Cramer's V               0.07812501
Quality score            0.00318975

Monotonic trend                valley

Significance tests

Bin A  Bin B  t-statistic      p-value     P[A > B]  P[B > A]
0      1     0.334525 5.630065e-01 7.193040e-01  0.280696
1      2     4.095897 4.298741e-02 9.789856e-01  0.021014
2      3     0.024540 8.755193e-01 5.627139e-01  0.437286
3      4    15.757594 7.199834e-05 9.999743e-01  0.000026
4      5     6.563219 1.041079e-02 9.951398e-01  0.004860
5      6     7.690930 5.549902e-03 9.973851e-01  0.002615
6      7     1.448735 2.287310e-01 8.858561e-01  0.114144
7      8     6.989473 8.199050e-03 9.961485e-01  0.003852
8      9     2.628779 1.049424e-01 9.478637e-01  0.052136
9     10     3.128995 7.691114e-02 3.817921e-02  0.961821
10     11    24.977930 5.799032e-07 4.353127e-08  1.000000



The expected value solution applied to scenario 2 satisfies neither the min_bin_size constraint nor the monotonicity constraint, hence the solution is not feasible.

[50]:

EVS_2 = -np.inf


## Expected value of perfect information (EVPI)¶

If we have prior information about the incoming economic scenarios, we could take optimal solutions for each scenario, with total IV:

[51]:

DIV0 = optb0.binning_table.iv
DIV1 = optb1.binning_table.iv
DIV2 = optb2.binning_table.iv
DIV = (DIV0 + DIV1 + DIV2) / 3

[52]:

DIV

[52]:

0.9990011753826167


However, this information is unlikely to be available in advance, so the best we can do in the long run is to use the stochastic programming, with expected total IV:

[53]:

SIV = sboptb.binning_table.iv

[54]:

SIV

[54]:

0.38426601503532376


The difference, in the case of perfect information, is the expected value of perfect information (EVPI) given by:

[55]:

EVPI = DIV - SIV
EVPI

[55]:

0.6147351603472929


## Value of stochastic solution (VSS)¶

The loss in IV by not considering stochasticity is the difference between the application of the expected value solution for each scenario and the stochastic model IV. The application of the EVS to each scenario results in infeasible solutions, thus

[56]:

VSS = SIV - (EVS_0 + EVS_1 + EVS_2)
VSS

[56]:

inf