Binning 2D tables¶
Binning table 2D: binary target¶

class
optbinning.binning.multidimensional.binning_statistics_2d.
BinningTable2D
(name_x, name_y, dtype_x, dtype_y, splits_x, splits_y, m, n, n_nonevent, n_event, D, P)¶ Bases:
optbinning.binning.binning_statistics.BinningTable
Binning table to summarize optimal binning of two numerical variables with respect to a binary target.
 Parameters
name_x (str, optional (default="")) – The name of variable x.
name_y (str, optional (default="")) – The name of variable y.
dtype_x (str, optional (default="numerical")) – The data type of variable x. Supported data type is “numerical” for continuous and ordinal variables.
dtype_y (str, optional (default="numerical")) – The data type of variable y. Supported data type is “numerical” for continuous and ordinal variables.
splits_x (numpy.ndarray) – List of split points for variable x.
splits_y (numpy.ndarray) – List of split points for variable y.
m (int) – Number of rows of the 2D array.
n (int) – Number of columns of the 2D array.
n_nonevent (numpy.ndarray) – Number of nonevents.
n_event (numpy.ndarray) – Number of events.
D (numpy.ndarray) – Event rate 2D array.
P (numpyndarray) – Bin indices 2D array.
Warning
This class is not intended to be instantiated by the user. It is preferable to use the class returned by the property
binning_table
available in all optimal binning classes.
analysis
(pvalue_test='chi2', n_samples=100, print_output=True)¶ Binning table analysis.
Statistical analysis of the binning table, computing the statistics Gini index, Information Value (IV), JensenShannon divergence, and the quality score. Additionally, several statistical significance tests between consecutive bins of the contingency table are performed: a frequentist test using the Chisquare test or the Fisher’s exact test, and a Bayesian A/B test using the beta distribution as a conjugate prior of the Bernoulli distribution.
 Parameters
pvalue_test (str, optional (default="chi2")) – The statistical test. Supported test are “chi2” to choose the Chisquare test and “fisher” to choose the Fisher exact test.
n_samples (int, optional (default=100)) – The number of samples to run the Bayesian A/B testing between consecutive bins to compute the probability of the event rate of bin A being greater than the event rate of bin B.
print_output (bool (default=True)) – Whether to print analysis information.
Notes
The Chisquare test uses scipy.stats.chi2_contingency, and the Fisher exact test uses scipy.stats.fisher_exact.

build
(show_digits=2, show_bin_xy=False, add_totals=True)¶ Build the binning table.
 Parameters
show_digits (int, optional (default=2)) – The number of significant digits of the bin column.
show_bin_xy (bool (default=False)) – Whether to show a single bin column with x and y.
add_totals (bool (default=True)) – Whether to add a last row with totals.
 Returns
binning_table
 Return type
pandas.DataFrame

property
gini
¶ The Gini coefficient or Accuracy Ratio.
The Gini coefficient is a quantitative measure of the discriminatory and predictive power of a variable. The Gini coefficient ranges from 0 to 1.
 Returns
gini
 Return type
float

property
hellinger
¶ The Hellinger divergence.
 Returns
hellinger
 Return type
float

property
iv
¶ The Information Value (IV) or Jeffrey’s divergence measure.
The IV ranges from 0 to Infinity.
 Returns
iv
 Return type
float

property
js
¶ The JensenShannon divergence measure (JS).
The JS ranges from 0 to \(\log(2)\).
 Returns
js
 Return type
float

property
ks
¶ The KolmogorovSmirnov statistic.
 Returns
ks
 Return type
float

plot
(metric='woe', savefig=None)¶ Plot the binning table.
Visualize the Weight of Evidence or the event rate for each bin as a matrix, and the x and y trend.
 Parameters
metric (str, optional (default="woe")) – Supported metrics are “woe” to show the Weight of Evidence (WoE) measure and “event_rate” to show the event rate.
savefig (str or None (default=None)) – Path to save the plot figure.

property
quality_score
¶ The quality score (QS).
The QS is a rating of the quality and discriminatory power of a variable. The QS ranges from 0 to 1.
 Returns
quality_score
 Return type
float

property
triangular
¶ The triangular divergence.
 Returns
triangular
 Return type
float
Binning table 2D: continuous target¶

class
optbinning.binning.multidimensional.binning_statistics_2d.
ContinuousBinningTable2D
(name_x, name_y, dtype_x, dtype_y, splits_x, splits_y, m, n, n_records, sums, stds, D, P)¶ Bases:
optbinning.binning.binning_statistics.ContinuousBinningTable
Binning table to summarize optimal binning of two numerical variables with respect to a binary target.
 Parameters
name_x (str, optional (default="")) – The name of variable x.
name_y (str, optional (default="")) – The name of variable y.
dtype_x (str, optional (default="numerical")) – The data type of variable x. Supported data type is “numerical” for continuous and ordinal variables.
dtype_y (str, optional (default="numerical")) – The data type of variable y. Supported data type is “numerical” for continuous and ordinal variables.
splits_x (numpy.ndarray) – List of split points for variable x.
splits_y (numpy.ndarray) – List of split points for variable y.
m (int) – Number of rows of the 2D array.
n (int) – Number of columns of the 2D array.
n_records (numpy.ndarray) – Number of records.
sums (numpy.ndarray) – Target sums.
stds (numpy.ndarray) – Target stds.
D (numpy.ndarray) – Mean 2D array.
P (numpyndarray) – Bin indices 2D array.
Warning
This class is not intended to be instantiated by the user. It is preferable to use the class returned by the property
binning_table
available in all optimal binning classes.
analysis
(print_output=True)¶ Binning table analysis.
Statistical analysis of the binning table, computing the Information Value (IV) and HerfindahlHirschman Index (HHI).
 Parameters
print_output (bool (default=True)) – Whether to print analysis information.
Notes
The IV for a continuous target is computed as follows:
\[IV = \sum_{i=1}^n U_i  \mu \frac{r_i}{r_T},\]where \(U_i\) is the target mean value for each bin, \(\mu\) is the total target mean, \(r_i\) is the number of records for each bin, and \(r_T\) is the total number of records.

build
(show_digits=2, show_bin_xy=False, add_totals=True)¶ Build the binning table.
 Parameters
show_digits (int, optional (default=2)) – The number of significant digits of the bin column.
show_bin_xy (bool (default=False)) – Whether to show a single bin column with x and y.
add_totals (bool (default=True)) – Whether to add a last row with totals.
 Returns
binning_table
 Return type
pandas.DataFrame

property
iv
¶ The Information Value (IV).
The IV ranges from 0 to Infinity.
 Returns
iv
 Return type
float

plot
(savefig=None)¶ Plot the binning table.
Visualize the mean for each bin as a matrix, and the x and y trend.
 Parameters
savefig (str or None (default=None)) – Path to save the plot figure.

property
quality_score
¶ The quality score (QS).
The QS is a rating of the quality and discriminatory power of a variable. The QS ranges from 0 to 1.
 Returns
quality_score
 Return type
float

property
woe
¶ The sum of absolute WoEs.
This metric is computed as follows:
\[WoE = \sum_{i=1}^n U_i  \mu,\]where \(U_i\) is the target mean value for each bin, \(\mu\) is the total target mean.
 Returns
woe
 Return type
float