Scorecard

class optbinning.scorecard.Scorecard(binning_process, estimator, scaling_method=None, scaling_method_params=None, intercept_based=False, reverse_scorecard=False, rounding=False, verbose=False)

Bases: optbinning.binning.base.Base, sklearn.base.BaseEstimator

Scorecard development given a binary or continuous target dtype.

Parameters
  • binning_process (object) – A BinningProcess instance.

  • estimator (object) – A supervised learning estimator with a fit and predict method that provides information about feature coefficients through a coef_ attribute. For binary classification, the estimator must include a predict_proba method.

  • scaling_method (str or None (default=None)) – The scaling method to control the range of the scores. Supported methods are “pdo_odds” and “min_max”. Method “pdo_odds” is only applicable for binary classification. If None, no scaling is applied.

  • scaling_method_params (dict or None (default=None)) – Dictionary with scaling method parameters. If scaling_method="pdo_odds" parameters required are: “pdo”, “odds”, and “scorecard_points”. If scaling_method="min_max" parameters required are “min” and “max”. If scaling_method=None, this parameter is not used.

  • intercept_based (bool (default=False)) – Build a intercept-based scorecard. A intercept-based scorecard modifies the original scorecard by setting the smallest point for each variable to zero and updating the intercept accordingly.

  • reverse_scorecard (bool (default=False)) – Whether to change the sense of the relationship between predictions and scorecard points to ascending/descending.

  • rounding (bool (default=False)) – Whether to round scorecard points. If scaling_method="min_max" a mixed-integer programming problem is solved to guarantee the minimum/maximum score after rounding. Otherwise, the scorecard points are round to the nearest integer.

  • verbose (bool (default=False)) – Enable verbose output.

binning_process_

The external binning process.

Type

object

estimator_

The external estimator fit on the reduced dataset.

Type

object

intercept_

The intercept if intercept_based=True.

Type

float

decision_function(X)

Predict confidence scores for samples. The confidence score for a sample is proportional to the signed distance of that sample to the hyperplane.

Parameters

X (pandas.DataFrame (n_samples, n_features)) – The data matrix for which we want to get the confidence scores.

Returns

scores – Confidence scores per (n_samples, n_classes) combination.

Return type

array of shape (n_samples, n_classes)

fit(X, y, sample_weight=None, metric_special=0, metric_missing=0, show_digits=2, check_input=False)

Fit scorecard.

Parameters
  • X (pandas.DataFrame (n_samples, n_features)) – Training vector, where n_samples is the number of samples.

  • y (array-like of shape (n_samples,)) – Target vector relative to x.

  • sample_weight (array-like of shape (n_samples,) (default=None)) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight. This option is only available for a binary target.

  • metric_special (float or str (default=0)) – The metric value to transform special codes in the input vector. Supported metrics are “empirical” to use the empirical WoE or event rate, and any numerical value.

  • metric_missing (float or str (default=0)) – The metric value to transform missing values in the input vector. Supported metrics are “empirical” to use the empirical WoE or event rate and any numerical value.

  • check_input (bool (default=False)) – Whether to check input arrays.

  • show_digits (int, optional (default=2)) – The number of significant digits of the bin column.

Returns

self – Fitted scorecard.

Return type

Scorecard

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

information(print_level=1)

Print overview information about the options settings and statistics.

Parameters

print_level (int (default=1)) – Level of details.

classmethod load(path)

Load scorecard from pickle file.

Parameters

path (str) – Pickle file path.

Example

>>> from optbinning import Scorecard
>>> scorecard = Scorecard.load("my_scorecard.pkl")
predict(X)

Predict using the fitted underlying estimator and the reduced dataset.

Parameters

X (pandas.DataFrame (n_samples, n_features)) – Training vector, where n_samples is the number of samples.

Returns

pred – The predicted target values.

Return type

array of shape (n_samples)

predict_proba(X)

Predict class probabilities using the fitted underlying estimator and the reduced dataset.

Parameters

X (pandas.DataFrame (n_samples, n_features)) – Training vector, where n_samples is the number of samples.

Returns

p – The class probabilities of the input samples.

Return type

array of shape (n_samples, n_classes)

save(path)

Save scorecard to pickle file.

Parameters

path (str) – Pickle file path.

score(X)

Score of the dataset.

Parameters

X (pandas.DataFrame (n_samples, n_features)) – Training vector, where n_samples is the number of samples.

Returns

score – The score of the input samples.

Return type

array of shape (n_samples)

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

table(style='summary')

Scorecard table.

Parameters

style (str, optional (default="summary")) – Scorecard’s style. Supported styles are “summary” and “detailed”. Summary only includes columns variable, bin description and points. Detailed contained additional columns with bin information and estimator coefficients.

Returns

table – The scorecard table.

Return type

pandas.DataFrame

Monitoring

class optbinning.scorecard.ScorecardMonitoring(scorecard, psi_method='cart', psi_n_bins=20, psi_min_bin_size=0.05, show_digits=2, verbose=False)

Bases: sklearn.base.BaseEstimator

Scorecard monitoring.

Parameters
  • scorecard (object) – A Scorecard fitted instance.

  • psi_method (str, optional (default="cart")) – The binning method to compute the Population Stability Index (PSI). Supported methods are “cart” for a CART decision tree, “quantile” to generate prebins with approximately same frequency and “uniform” to generate prebins with equal width. Method “cart” uses sklearn.tree.DecistionTreeClassifier.

  • psi_n_bins (int (default=20)) – The maximum number of bins to compute PSI.

  • psi_min_bin_size (float (default=0.05)) – The fraction of mininum number of records for PSI bin.

  • show_digits (int, optional (default=2)) – The number of significant digits of the bin column.

  • verbose (bool (default=False)) – Enable verbose output.

fit(X_actual, y_actual, X_expected, y_expected)

Fit monitoring with actual and expected data.

Parameters
  • X_actual (pandas.DataFrame) – New/actual/test data input samples.

  • y_actual (array-like of shape (n_samples,)) – Target vector relative to X actual.

  • X_expected (pandas.DataFrame) – Trainning data used for fitting the scorecard.

  • y_expected (array-like of shape (n_samples,)) – Target vector relative to X expected.

Returns

self – Fitted monitoring.

Return type

ScorecardMonitoring

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

information(print_level=1)

Print overview information about the options settings and statistics.

Parameters

print_level (int (default=1)) – Level of details.

psi_plot(savefig=None)

Plot Population Stability Index (PSI).

Parameters

savefig (str or None (default=None)) – Path to save the plot figure.

property psi_splits

List of splits points used to compute system PSI.

Returns

splits

Return type

numpy.ndarray

psi_table()

System Population Stability Index (PSI) table.

Returns

psi_table

Return type

pandas.DataFrame

psi_variable_table(name=None, style='summary')

Population Stability Index (PSI) at variable level.

Parameters
  • name (str or None (default=None)) – The variable name. If name is None, a table with all variables is returned.

  • style (str, optional (default="summary")) – Supported styles are “summary” and “detailed”. Summary only includes the total PSI for each variable. Detailed includes the PSI for each variable at bin level.

Returns

psi_table

Return type

pandas.DataFrame

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

system_stability_report()

Print overview information and statistics about system stability. It includes qualitative suggestions regarding the necessity of scorecard updates.

tests_table()

Compute statistical tests to determine if event rate (Chi-square test - binary target) or mean (Student’s t-test - continuous target) are significantly different. Null hypothesis (actual == expected).

Returns

tests_table

Return type

pandas.DataFrame

Plot functions

optbinning.scorecard.plot_auc_roc(y, y_pred, title=None, xlabel=None, ylabel=None, savefig=False, fname=None, **kwargs)

Plot Area Under the Receiver Operating Characteristic Curve (AUC ROC).

Parameters
  • y (array-like, shape = (n_samples,)) – Array with the target labels.

  • y_pred (array-like, shape = (n_samples,)) – Array with predicted probabilities.

  • title (str or None, optional (default=None)) – Title for the plot.

  • xlabel (str or None, optional (default=None)) – Label for the x-axis.

  • ylabel (str or None, optional (default=None)) – Label for the y-axis.

  • savefig (bool (default=False)) – Whether to save the figure.

  • fname (str or None, optional (default=None)) – Name for the figure file.

  • **kwargs (keyword arguments) – Keyword arguments for matplotlib.pyplot.savefig().

optbinning.scorecard.plot_cap(y, y_pred, title=None, xlabel=None, ylabel=None, savefig=False, fname=None, **kwargs)

Plot Cumulative Accuracy Profile (CAP).

Parameters
  • y (array-like, shape = (n_samples,)) – Array with the target labels.

  • y_pred (array-like, shape = (n_samples,)) – Array with predicted probabilities.

  • title (str or None, optional (default=None)) – Title for the plot.

  • xlabel (str or None, optional (default=None)) – Label for the x-axis.

  • ylabel (str or None, optional (default=None)) – Label for the y-axis.

  • savefig (bool (default=False)) – Whether to save the figure.

  • fname (str or None, optional (default=None)) – Name for the figure file.

  • **kwargs (keyword arguments) – Keyword arguments for matplotlib.pyplot.savefig().

optbinning.scorecard.plot_ks(y, y_pred, title=None, xlabel=None, ylabel=None, savefig=False, fname=None, **kwargs)

Plot Kolmogorov-Smirnov (KS).

Parameters
  • y (array-like, shape = (n_samples,)) – Array with the target labels.

  • y_pred (array-like, shape = (n_samples,)) – Array with predicted probabilities.

  • title (str or None, optional (default=None)) – Title for the plot.

  • xlabel (str or None, optional (default=None)) – Label for the x-axis.

  • ylabel (str or None, optional (default=None)) – Label for the y-axis.

  • savefig (bool (default=False)) – Whether to save the figure.

  • fname (str or None, optional (default=None)) – Name for the figure file.

  • **kwargs (keyword arguments) – Keyword arguments for matplotlib.pyplot.savefig().