Tutorial: optimal binning with binary target under uncertainty

The drawback of performing optimal binning given only expected event rates is that variability of event rates in different periods is not taken into account. In this tutorial, we show how scenario-based stochastic programming allows incorporating uncertainty without much difficulty.

[1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from scipy import stats
[2]:
from optbinning import OptimalBinning
from optbinning.binning.uncertainty import SBOptimalBinning

Scenario generation

We generate three scenarios, all equally likely, aiming to represent three economic scenarios severity using the customer’s score variable, for instance.

Scenario 0 - Normal (Realistic): A low customer’ score has a higher event rate (default rate, churn, etc) than a high customer’s score. The population corresponding to non-event and event are reasonably separated.

[3]:
N0 = int(1e5)

xe = stats.beta(a=4, b=15).rvs(size=N0, random_state=42)
ye = stats.bernoulli(p=0.7).rvs(size=N0, random_state=42)
xn = stats.beta(a=6, b=8).rvs(size=N0, random_state=42)
yn = stats.bernoulli(p=0.2).rvs(size=N0, random_state=42)

x0 = np.concatenate((xn, xe), axis=0)
y0 = np.concatenate((yn, ye), axis=0)
[4]:
def plot_distribution(x, y):
    plt.hist(x[y == 0], label="n_nonevent", color="b", alpha=0.5)
    plt.hist(x[y == 1], label="n_event", color="r", alpha=0.5)
    plt.legend()
    plt.show()
[5]:
plot_distribution(x0, y0)
../_images/tutorials_tutorial_binary_under_uncertainty_9_0.png

Scenario 1: Good (Optimistic): A low customer’ score has a much higher event rate (default rate, churn, etc) than a high customer’s score. The population corresponding to non-event and event rate are very well separated, showing minimum overlap regions.

[6]:
N1 = int(5e4)

xe = stats.beta(a=25, b=50).rvs(size=N1, random_state=42)
ye = stats.bernoulli(p=0.9).rvs(size=N1, random_state=42)
xn = stats.beta(a=22, b=25).rvs(size=N1, random_state=42)
yn = stats.bernoulli(p=0.05).rvs(size=N1, random_state=42)

x1 = np.concatenate((xn, xe), axis=0)
y1 = np.concatenate((yn, ye), axis=0)
[7]:
plot_distribution(x1, y1)
../_images/tutorials_tutorial_binary_under_uncertainty_12_0.png

Scenario 2: Bad (Pessimistic): Customer’s behavior cannot be accurately segmented, and a general increase in event rates is exhibited. The populations corresponding to non-event and event are practically overlapped.

[8]:
N2 = int(5e4)

xe = stats.beta(a=4, b=6).rvs(size=N2, random_state=42)
ye = stats.bernoulli(p=0.7).rvs(size=N2, random_state=42)
xn = stats.beta(a=8, b=10).rvs(size=N2, random_state=42)
yn = stats.bernoulli(p=0.4).rvs(size=N2, random_state=42)

x2 = np.concatenate((xn, xe), axis=0)
y2 = np.concatenate((yn, ye), axis=0)
[9]:
plot_distribution(x2, y2)
../_images/tutorials_tutorial_binary_under_uncertainty_15_0.png

Scenario-based stochastic optimal binning

Prepare scenarios data and instantiate an SBOptimalBinning object class. We set a descending monotonicity constraint with respect to event rate and a minimum bin size.

[10]:
X = [x0, x1, x2]
Y = [y0, y1, y2]
[11]:
sboptb = SBOptimalBinning(monotonic_trend="descending", min_bin_size=0.05)
sboptb.fit(X, Y)
[11]:
SBOptimalBinning(min_bin_size=0.05, monotonic_trend='descending')
[12]:
sboptb.status
[12]:
'OPTIMAL'

We obtain “only” three splits guaranteeing feasibility for each scenario.

[13]:
sboptb.splits
[13]:
array([0.28578988, 0.36384453, 0.43260857])
[14]:
sboptb.information(print_level=2)
optbinning (Version 0.19.0)
Copyright (c) 2019-2024 Guillermo Navas-Palencia, Apache License 2.0

  Begin options
    name                                       * d
    prebinning_method                   cart   * d
    max_n_prebins                         20   * d
    min_prebin_size                     0.05   * d
    min_n_bins                            no   * d
    max_n_bins                            no   * d
    min_bin_size                        0.05   * U
    max_bin_size                          no   * d
    monotonic_trend               descending   * U
    min_event_rate_diff                    0   * d
    max_pvalue                            no   * d
    max_pvalue_policy            consecutive   * d
    class_weight                          no   * d
    user_splits                           no   * d
    user_splits_fixed                     no   * d
    special_codes                         no   * d
    split_digits                          no   * d
    time_limit                           100   * d
    verbose                            False   * d
  End options

  Name    : UNKNOWN
  Status  : OPTIMAL

  Pre-binning statistics
    Number of pre-bins                    16
    Number of refinements                  1

  Solver statistics
    Type                                  cp
    Number of booleans                    40
    Number of branches                    91
    Number of conflicts                    1
    Objective value                  2736534
    Best objective bound             2736534

  Timing
    Total time                          1.21 sec
    Pre-processing                      0.01 sec   (  0.90%)
    Pre-binning                         0.70 sec   ( 58.18%)
    Solver                              0.49 sec   ( 40.72%)
      model generation                  0.44 sec   ( 90.22%)
      optimizer                         0.05 sec   (  9.78%)
    Post-processing                     0.00 sec   (  0.10%)

The binning table

As other optimal binning algorithms in OptBinning, SBOptimalBinning also returns a binning table displaying the binned data considering all scenarios.

[15]:
sboptb.binning_table.build()
[15]:
Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.29) 119678 0.299195 42005 77673 0.649017 -0.688603 0.138209 0.016943
1 [0.29, 0.36) 79729 0.199323 32837 46892 0.588142 -0.430175 0.036613 0.004542
2 [0.36, 0.43) 68378 0.170945 39045 29333 0.428983 0.212118 0.007633 0.000952
3 [0.43, inf) 132215 0.330537 93498 38717 0.292834 0.807778 0.201811 0.024562
4 Special 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
5 Missing 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 400000 1.000000 207385 192615 0.481538 0.384266 0.046999
[16]:
sboptb.binning_table.plot(metric="event_rate")
../_images/tutorials_tutorial_binary_under_uncertainty_27_0.png
[17]:
sboptb.binning_table.analysis()
---------------------------------------------
OptimalBinning: Binary Binning Table Analysis
---------------------------------------------

  General metrics

    Gini index               0.33117510
    IV (Jeffrey)             0.38426602
    JS (Jensen-Shannon)      0.04699884
    Hellinger                0.04750871
    Triangular               0.18408357
    KS                       0.28582022
    HHI                      0.26772434
    HHI (normalized)         0.12126921
    Cramer's V               0.30285954
    Quality score            0.87798527

  Monotonic trend            descending

  Significance tests

    Bin A  Bin B  t-statistic       p-value  P[A > B]     P[B > A]
        0      1   756.303469 1.709260e-166       1.0 1.110223e-16
        1      2  3732.973381  0.000000e+00       1.0 1.110223e-16
        2      3  3726.998391  0.000000e+00       1.0 1.110223e-16

Expected value solution (EVS)

The expected value solution is calculated with the normal (expected) scenario.

[18]:
optb = OptimalBinning(monotonic_trend="descending", min_bin_size=0.05)
optb.fit(x0, y0)
[18]:
OptimalBinning(min_bin_size=0.05, monotonic_trend='descending')
[19]:
optb.binning_table.build()
[19]:
Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.10) 10255 0.051275 3061 7194 0.701511 -1.054853 0.054945 0.006566
1 [0.10, 0.14) 12519 0.062595 3911 8608 0.687595 -0.989246 0.059422 0.007139
2 [0.14, 0.18) 18333 0.091665 6065 12268 0.669176 -0.904807 0.073418 0.008877
3 [0.18, 0.20) 13631 0.068155 4884 8747 0.641699 -0.783094 0.041320 0.005037
4 [0.20, 0.23) 14606 0.073030 5684 8922 0.610845 -0.651212 0.030891 0.003795
5 [0.23, 0.27) 17995 0.089975 8043 9952 0.553043 -0.413319 0.015470 0.001920
6 [0.27, 0.30) 13047 0.065235 6672 6375 0.488618 -0.154812 0.001572 0.000196
7 [0.30, 0.35) 18825 0.094125 11158 7667 0.407278 0.174884 0.002847 0.000355
8 [0.35, 0.39) 16401 0.082005 10903 5498 0.335223 0.484306 0.018430 0.002282
9 [0.39, 0.44) 18759 0.093795 13688 5071 0.270324 0.792634 0.053994 0.006578
10 [0.44, 0.49) 14549 0.072745 11273 3276 0.225170 1.035440 0.068446 0.008193
11 [0.49, 0.54) 11019 0.055095 8697 2322 0.210727 1.120202 0.059684 0.007093
12 [0.54, 0.60) 10030 0.050150 7957 2073 0.206680 1.144708 0.056454 0.006695
13 [0.60, inf) 10031 0.050155 7988 2043 0.203669 1.163174 0.058080 0.006877
14 Special 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
15 Missing 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 200000 1.000000 109984 90016 0.450080 0.594974 0.071603
[20]:
optb.binning_table.plot(metric="event_rate")
../_images/tutorials_tutorial_binary_under_uncertainty_33_0.png
[21]:
optb.binning_table.analysis()
---------------------------------------------
OptimalBinning: Binary Binning Table Analysis
---------------------------------------------

  General metrics

    Gini index               0.42141055
    IV (Jeffrey)             0.59497411
    JS (Jensen-Shannon)      0.07160267
    Hellinger                0.07295186
    Triangular               0.27638899
    KS                       0.34108533
    HHI                      0.07501900
    HHI (normalized)         0.01335360
    Cramer's V               0.36927482
    Quality score            0.16335319

  Monotonic trend            descending

  Significance tests

    Bin A  Bin B  t-statistic      p-value  P[A > B]     P[B > A]
        0      1     5.139745 2.338408e-02  0.988706 1.129409e-02
        1      2    11.534993 6.829832e-04  0.999721 2.787731e-04
        2      3    26.208899 3.064073e-07  1.000000 7.445353e-09
        3      4    28.661681 8.619251e-08  1.000000 4.436704e-09
        4      5   110.500800 7.611468e-26  1.000000 1.110223e-16
        5      6   125.906119 3.223792e-29  1.000000 1.110223e-16
        6      7   206.865709 6.632897e-47  1.000000 1.110223e-16
        7      8   194.419542 3.449032e-44  1.000000 1.110223e-16
        8      9   175.309976 5.122903e-40  1.000000 1.110223e-16
        9     10    88.957203 4.034468e-21  1.000000 1.110223e-16
       10     11     7.648694 5.681344e-03  0.997558 2.442113e-03
       11     12     0.520543 4.706103e-01  0.764881 2.351195e-01
       12     13     0.278879 5.974371e-01  0.701329 2.986709e-01

Scenario analysis

Scenario 0 - Normal (Realistic)

[22]:
bt0 = sboptb.binning_table_scenario(scenario_id=0)
bt0.build()
[22]:
Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.29) 93851 0.469255 34903 58948 0.628102 -0.724430 0.244506 0.029912
1 [0.29, 0.36) 32141 0.160705 18945 13196 0.410566 0.161279 0.004138 0.000517
2 [0.36, 0.43) 24488 0.122440 17289 7199 0.293981 0.675781 0.052184 0.006402
3 [0.43, inf) 49520 0.247600 38847 10673 0.215529 1.091566 0.256123 0.030515
4 Special 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
5 Missing 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 200000 1.000000 109984 90016 0.450080 0.556952 0.067345
[23]:
bt0.plot(metric="event_rate")
../_images/tutorials_tutorial_binary_under_uncertainty_38_0.png
[24]:
optb0 = OptimalBinning(monotonic_trend="descending", min_bin_size=0.05)
optb0.fit(x0, y0)
[24]:
OptimalBinning(min_bin_size=0.05, monotonic_trend='descending')
[25]:
optb0.binning_table.build()
[25]:
Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.10) 10255 0.051275 3061 7194 0.701511 -1.054853 0.054945 0.006566
1 [0.10, 0.14) 12519 0.062595 3911 8608 0.687595 -0.989246 0.059422 0.007139
2 [0.14, 0.18) 18333 0.091665 6065 12268 0.669176 -0.904807 0.073418 0.008877
3 [0.18, 0.20) 13631 0.068155 4884 8747 0.641699 -0.783094 0.041320 0.005037
4 [0.20, 0.23) 14606 0.073030 5684 8922 0.610845 -0.651212 0.030891 0.003795
5 [0.23, 0.27) 17995 0.089975 8043 9952 0.553043 -0.413319 0.015470 0.001920
6 [0.27, 0.30) 13047 0.065235 6672 6375 0.488618 -0.154812 0.001572 0.000196
7 [0.30, 0.35) 18825 0.094125 11158 7667 0.407278 0.174884 0.002847 0.000355
8 [0.35, 0.39) 16401 0.082005 10903 5498 0.335223 0.484306 0.018430 0.002282
9 [0.39, 0.44) 18759 0.093795 13688 5071 0.270324 0.792634 0.053994 0.006578
10 [0.44, 0.49) 14549 0.072745 11273 3276 0.225170 1.035440 0.068446 0.008193
11 [0.49, 0.54) 11019 0.055095 8697 2322 0.210727 1.120202 0.059684 0.007093
12 [0.54, 0.60) 10030 0.050150 7957 2073 0.206680 1.144708 0.056454 0.006695
13 [0.60, inf) 10031 0.050155 7988 2043 0.203669 1.163174 0.058080 0.006877
14 Special 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
15 Missing 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 200000 1.000000 109984 90016 0.450080 0.594974 0.071603
[26]:
optb0.binning_table.plot(metric="event_rate")
../_images/tutorials_tutorial_binary_under_uncertainty_41_0.png

Apply expected value solution to scenario 0.

[27]:
evs_optb0 = OptimalBinning(user_splits=optb.splits)
evs_optb0.fit(x0, y0)
[27]:
OptimalBinning(user_splits=array([0.1008646 , 0.13640077, 0.17635711, 0.20390539, 0.23334569,
       0.27135116, 0.30051835, 0.34623086, 0.38964605, 0.44464479,
       0.49356021, 0.53990114, 0.59801421]))
[28]:
evs_optb0.binning_table.build()
[28]:
Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.10) 10255 0.051275 3061 7194 0.701511 -1.054853 0.054945 0.006566
1 [0.10, 0.14) 12519 0.062595 3911 8608 0.687595 -0.989246 0.059422 0.007139
2 [0.14, 0.18) 18333 0.091665 6065 12268 0.669176 -0.904807 0.073418 0.008877
3 [0.18, 0.20) 13631 0.068155 4884 8747 0.641699 -0.783094 0.041320 0.005037
4 [0.20, 0.23) 14606 0.073030 5684 8922 0.610845 -0.651212 0.030891 0.003795
5 [0.23, 0.27) 17995 0.089975 8043 9952 0.553043 -0.413319 0.015470 0.001920
6 [0.27, 0.30) 13047 0.065235 6672 6375 0.488618 -0.154812 0.001572 0.000196
7 [0.30, 0.35) 18825 0.094125 11158 7667 0.407278 0.174884 0.002847 0.000355
8 [0.35, 0.39) 16401 0.082005 10903 5498 0.335223 0.484306 0.018430 0.002282
9 [0.39, 0.44) 18759 0.093795 13688 5071 0.270324 0.792634 0.053994 0.006578
10 [0.44, 0.49) 14549 0.072745 11273 3276 0.225170 1.035440 0.068446 0.008193
11 [0.49, 0.54) 11019 0.055095 8697 2322 0.210727 1.120202 0.059684 0.007093
12 [0.54, 0.60) 10030 0.050150 7957 2073 0.206680 1.144708 0.056454 0.006695
13 [0.60, inf) 10031 0.050155 7988 2043 0.203669 1.163174 0.058080 0.006877
14 Special 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
15 Missing 0 0.000000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 200000 1.000000 109984 90016 0.450080 0.594974 0.071603
[29]:
evs_optb0.binning_table.plot(metric="event_rate")
../_images/tutorials_tutorial_binary_under_uncertainty_45_0.png

The expected value solution applied to scenarion 0 does not satisfy the min_bin_size constraint, hence the solution is not feasible.

[30]:
EVS_0 = 0.594974

Scenario 1: Good (Optimistic)

[31]:
bt1 = sboptb.binning_table_scenario(scenario_id=1)
bt1.build()
[31]:
Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.29) 9840 0.09840 1126 8714 0.885569 -2.146624 0.347828 0.036679
1 [0.29, 0.36) 29807 0.29807 5902 23905 0.801993 -1.499161 0.586072 0.067087
2 [0.36, 0.43) 24262 0.24262 12658 11604 0.478279 -0.013425 0.000044 0.000005
3 [0.43, inf) 36091 0.36091 32821 3270 0.090604 2.205914 1.226988 0.128301
4 Special 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
5 Missing 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 100000 1.00000 52507 47493 0.474930 2.160931 0.232072
[32]:
bt1.plot(metric="event_rate")
../_images/tutorials_tutorial_binary_under_uncertainty_50_0.png
[33]:
optb1 = OptimalBinning(monotonic_trend="descending", min_bin_size=0.05)
optb1.fit(x1, y1)
[33]:
OptimalBinning(min_bin_size=0.05, monotonic_trend='descending')
[34]:
optb1.binning_table.build()
[34]:
Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.28) 8209 0.08209 908 7301 0.889390 -2.184886 0.298095 0.031264
1 [0.28, 0.30) 5545 0.05545 738 4807 0.866907 -1.974249 0.172075 0.018581
2 [0.30, 0.31) 5186 0.05186 777 4409 0.850174 -1.836327 0.143301 0.015756
3 [0.31, 0.33) 5837 0.05837 956 4881 0.836217 -1.730712 0.146359 0.016307
4 [0.33, 0.34) 5176 0.05176 1077 4099 0.791924 -1.436928 0.094544 0.010896
5 [0.34, 0.36) 7055 0.07055 1760 5295 0.750532 -1.201813 0.093706 0.011056
6 [0.36, 0.38) 8537 0.08537 2882 5655 0.662411 -0.774420 0.049704 0.006062
7 [0.38, 0.40) 6189 0.06189 2802 3387 0.547261 -0.289975 0.005205 0.000648
8 [0.40, 0.41) 5058 0.05058 2862 2196 0.434164 0.164519 0.001360 0.000170
9 [0.41, 0.44) 8246 0.08246 5781 2465 0.298933 0.752021 0.043766 0.005345
10 [0.44, 0.45) 5253 0.05253 4321 932 0.177422 1.433545 0.089840 0.010358
11 [0.45, 0.47) 5009 0.05009 4420 589 0.117588 1.915105 0.137461 0.014960
12 [0.47, 0.49) 5204 0.05204 4780 424 0.081476 2.322098 0.190662 0.019603
13 [0.49, 0.53) 8825 0.08825 8283 542 0.061416 2.626330 0.384332 0.037733
14 [0.53, 0.56) 5061 0.05061 4807 254 0.050188 2.840130 0.244824 0.023237
15 [0.56, inf) 5610 0.05610 5353 257 0.045811 2.935972 0.283430 0.026488
16 Special 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
17 Missing 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 100000 1.00000 52507 47493 0.474930 2.378665 0.248465
[35]:
optb1.binning_table.plot(metric="event_rate")
../_images/tutorials_tutorial_binary_under_uncertainty_53_0.png

Apply expected value solution to scenario 1.

[36]:
evs_optb1 = OptimalBinning(user_splits=optb.splits)
evs_optb1.fit(x1, y1)
[36]:
OptimalBinning(user_splits=array([0.1008646 , 0.13640077, 0.17635711, 0.20390539, 0.23334569,
       0.27135116, 0.30051835, 0.34623086, 0.38964605, 0.44464479,
       0.49356021, 0.53990114, 0.59801421]))
[37]:
evs_optb1.binning_table.build()
[37]:
Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.20) 247 0.00247 26 221 0.894737 -2.240430 0.009316 0.000969
1 [0.20, 0.23) 1092 0.01092 118 974 0.891941 -2.211091 0.040377 0.004219
2 [0.23, 0.27) 5037 0.05037 566 4471 0.887632 -2.167137 0.180654 0.018995
3 [0.27, 0.30) 7918 0.07918 1019 6899 0.871306 -2.012919 0.253339 0.027214
4 [0.30, 0.35) 18126 0.18126 3313 14813 0.817224 -1.598015 0.397590 0.045005
5 [0.35, 0.39) 17091 0.17091 5742 11349 0.664034 -0.781686 0.101310 0.012351
6 [0.39, 0.44) 18095 0.18095 11857 6238 0.344736 0.541895 0.051194 0.006322
7 [0.44, 0.49) 14295 0.14295 12739 1556 0.108849 2.002186 0.420164 0.045199
8 [0.49, 0.54) 10111 0.10111 9523 588 0.058154 2.684374 0.453620 0.044133
9 [0.54, 0.60) 6215 0.06215 5918 297 0.047788 2.891658 0.307832 0.028976
10 [0.60, inf) 1773 0.01773 1686 87 0.049069 2.863842 0.086712 0.008199
11 Special 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
12 Missing 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 100000 1.00000 52507 47493 0.474930 2.302108 0.241582
[38]:
evs_optb1.binning_table.plot(metric="event_rate")
../_images/tutorials_tutorial_binary_under_uncertainty_57_0.png
[39]:
evs_optb1.binning_table.analysis()
---------------------------------------------
OptimalBinning: Binary Binning Table Analysis
---------------------------------------------

  General metrics

    Gini index               0.72566718
    IV (Jeffrey)             2.30210757
    JS (Jensen-Shannon)      0.24158211
    Hellinger                0.26182242
    Triangular               0.84830395
    KS                       0.61004329
    HHI                      0.13857518
    HHI (normalized)         0.06678978
    Cramer's V               0.64902999
    Quality score            0.00000000

  Monotonic trend                valley

  Significance tests

    Bin A  Bin B  t-statistic       p-value  P[A > B]     P[B > A]
        0      1     0.016401  8.980961e-01  0.566231 4.337689e-01
        1      2     0.168135  6.817748e-01  0.666387 3.336129e-01
        2      3     7.641448  5.704212e-03  0.997457 2.543322e-03
        3      4   116.236493  4.218674e-27  1.000000 1.110223e-16
        4      5  1080.747496 5.050568e-237  1.000000 1.110223e-16
        5      6  3584.325798  0.000000e+00  1.000000 1.110223e-16
        6      7  2431.847481  0.000000e+00  1.000000 1.110223e-16
        7      8   189.938108  3.279750e-43  1.000000 1.110223e-16
        8      9     8.068486  4.504174e-03  0.998346 1.654368e-03
        9     10     0.049526  8.238907e-01  0.420232 5.797684e-01

The expected value solution applied to scenario 1 satisfies neither the min_bin_size constraint nor the monotonicity constraint, hence the solution is not feasible.

[40]:
EVS_1 = -np.inf

Scenario 2: Bad (Pessimistic)

[41]:
bt2 = sboptb.binning_table_scenario(scenario_id=2)
bt2.build()
[41]:
Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.29) 15987 0.15987 5976 10011 0.626196 -0.310979 1.509941e-02 1.879858e-03
1 [0.29, 0.36) 17781 0.17781 7990 9791 0.550644 0.001682 5.028570e-07 6.285711e-08
2 [0.36, 0.43) 19628 0.19628 9098 10530 0.536479 0.058781 6.800268e-04 8.499112e-05
3 [0.43, inf) 46604 0.46604 21830 24774 0.531585 0.078445 2.877876e-03 3.596423e-04
4 Special 0 0.00000 0 0 0.000000 0.000000 0.000000e+00 0.000000e+00
5 Missing 0 0.00000 0 0 0.000000 0.000000 0.000000e+00 0.000000e+00
Totals 100000 1.00000 44894 55106 0.551060 1.865782e-02 2.324554e-03
[42]:
bt2.plot(metric="event_rate")
../_images/tutorials_tutorial_binary_under_uncertainty_63_0.png
[43]:
optb2 = OptimalBinning(monotonic_trend="descending", min_bin_size=0.05)
optb2.fit(x2, y2)
[43]:
OptimalBinning(min_bin_size=0.05, monotonic_trend='descending')
[44]:
optb2.binning_table.build()
[44]:
Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.23) 7556 0.07556 2543 5013 0.663446 -0.473736 0.016261 0.002014
1 [0.23, 0.29) 9657 0.09657 3918 5739 0.594284 -0.176749 0.002982 0.000372
2 [0.29, 0.33) 8559 0.08559 3801 4758 0.555906 -0.019609 0.000033 0.000004
3 [0.33, 0.39) 15848 0.15848 7234 8614 0.543539 0.030358 0.000146 0.000018
4 [0.39, inf) 58380 0.58380 27398 30982 0.530695 0.082018 0.003941 0.000493
5 Special 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
6 Missing 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000
Totals 100000 1.00000 44894 55106 0.551060 0.023364 0.002901
[45]:
optb2.binning_table.plot(metric="event_rate")
../_images/tutorials_tutorial_binary_under_uncertainty_66_0.png

Apply expected value solution to scenario 2.

[46]:
evs_optb2 = OptimalBinning(user_splits=optb.splits)
evs_optb2.fit(x2, y2)
[46]:
OptimalBinning(user_splits=array([0.1008646 , 0.13640077, 0.17635711, 0.20390539, 0.23334569,
       0.27135116, 0.30051835, 0.34623086, 0.38964605, 0.44464479,
       0.49356021, 0.53990114, 0.59801421]))
[47]:
evs_optb2.binning_table.build()
[47]:
Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 (-inf, 0.14) 1292 0.01292 405 887 0.686533 -0.579003 0.004096 5.050214e-04
1 [0.14, 0.18) 1850 0.01850 598 1252 0.676757 -0.533952 0.005019 6.200181e-04
2 [0.18, 0.20) 2002 0.02002 709 1293 0.645854 -0.395910 0.003037 3.771741e-04
3 [0.20, 0.23) 2944 0.02944 1049 1895 0.643682 -0.386427 0.004259 5.291176e-04
4 [0.23, 0.27) 5326 0.05326 2134 3192 0.599324 -0.197695 0.002054 2.563524e-04
5 [0.27, 0.30) 5390 0.05390 2291 3099 0.574954 -0.097137 0.000506 6.318381e-05
6 [0.30, 0.35) 10414 0.10414 4667 5747 0.551853 -0.003207 0.000001 1.338664e-07
7 [0.35, 0.39) 11782 0.11782 5375 6407 0.543796 0.029322 0.000101 1.267992e-05
8 [0.39, 0.44) 15901 0.15901 7509 8392 0.527766 0.093778 0.001404 1.754450e-04
9 [0.44, 0.54) 23757 0.23757 11416 12341 0.519468 0.127043 0.003854 4.814509e-04
10 [0.54, 0.60) 9639 0.09639 4529 5110 0.530138 0.084256 0.000687 8.582857e-05
11 [0.60, inf) 9703 0.09703 4212 5491 0.565907 -0.060218 0.000351 4.382723e-05
12 Special 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000e+00
13 Missing 0 0.00000 0 0 0.000000 0.000000 0.000000 0.000000e+00
Totals 100000 1.00000 44894 55106 0.551060 0.025370 3.150233e-03
[48]:
evs_optb2.binning_table.plot(metric="event_rate")
../_images/tutorials_tutorial_binary_under_uncertainty_70_0.png
[49]:
evs_optb2.binning_table.analysis()
---------------------------------------------
OptimalBinning: Binary Binning Table Analysis
---------------------------------------------

  General metrics

    Gini index               0.07657686
    IV (Jeffrey)             0.02536981
    JS (Jensen-Shannon)      0.00315023
    Hellinger                0.00316066
    Triangular               0.01251904
    KS                       0.05109803
    HHI                      0.13267476
    HHI (normalized)         0.06595743
    Cramer's V               0.07812501
    Quality score            0.00318975

  Monotonic trend                valley

  Significance tests

    Bin A  Bin B  t-statistic      p-value     P[A > B]  P[B > A]
        0      1     0.334525 5.630065e-01 7.193040e-01  0.280696
        1      2     4.095897 4.298741e-02 9.789856e-01  0.021014
        2      3     0.024540 8.755193e-01 5.627139e-01  0.437286
        3      4    15.757594 7.199834e-05 9.999743e-01  0.000026
        4      5     6.563219 1.041079e-02 9.951398e-01  0.004860
        5      6     7.690930 5.549902e-03 9.973851e-01  0.002615
        6      7     1.448735 2.287310e-01 8.858561e-01  0.114144
        7      8     6.989473 8.199050e-03 9.961485e-01  0.003852
        8      9     2.628779 1.049424e-01 9.478637e-01  0.052136
        9     10     3.128995 7.691114e-02 3.817921e-02  0.961821
       10     11    24.977930 5.799032e-07 4.353127e-08  1.000000

The expected value solution applied to scenario 2 satisfies neither the min_bin_size constraint nor the monotonicity constraint, hence the solution is not feasible.

[50]:
EVS_2 = -np.inf

Expected value of perfect information (EVPI)

If we have prior information about the incoming economic scenarios, we could take optimal solutions for each scenario, with total IV:

[51]:
DIV0 = optb0.binning_table.iv
DIV1 = optb1.binning_table.iv
DIV2 = optb2.binning_table.iv
DIV = (DIV0 + DIV1 + DIV2) / 3
[52]:
DIV
[52]:
0.9990011753826167

However, this information is unlikely to be available in advance, so the best we can do in the long run is to use the stochastic programming, with expected total IV:

[53]:
SIV = sboptb.binning_table.iv
[54]:
SIV
[54]:
0.38426601503532376

The difference, in the case of perfect information, is the expected value of perfect information (EVPI) given by:

[55]:
EVPI = DIV - SIV
EVPI
[55]:
0.6147351603472929

Value of stochastic solution (VSS)

The loss in IV by not considering stochasticity is the difference between the application of the expected value solution for each scenario and the stochastic model IV. The application of the EVS to each scenario results in infeasible solutions, thus

[56]:
VSS = SIV - (EVS_0 + EVS_1 + EVS_2)
VSS
[56]:
inf