Scorecard¶

class optbinning.scorecard.Scorecard(binning_process, estimator, scaling_method=None, scaling_method_params=None, intercept_based=False, reverse_scorecard=False, rounding=False, verbose=False)¶

Bases: optbinning.binning.base.Base, sklearn.base.BaseEstimator

Scorecard development given a binary or continuous target dtype.

Parameters

binning_process (object) – A BinningProcess instance.
estimator (object) – A supervised learning estimator with a fit and predict method that provides information about feature coefficients through a coef_ attribute. For binary classification, the estimator must include a predict_proba method.
scaling_method (str or None (default=None)) – The scaling method to control the range of the scores. Supported methods are “pdo_odds” and “min_max”. Method “pdo_odds” is only applicable for binary classification. If None, no scaling is applied.
scaling_method_params (dict or None (default=None)) – Dictionary with scaling method parameters. If scaling_method="pdo_odds" parameters required are: “pdo”, “odds”, and “scorecard_points”. If scaling_method="min_max" parameters required are “min” and “max”. If scaling_method=None, this parameter is not used.
intercept_based (bool (default=False)) – Build a intercept-based scorecard. A intercept-based scorecard modifies the original scorecard by setting the smallest point for each variable to zero and updating the intercept accordingly.
reverse_scorecard (bool (default=False)) – Whether to change the sense of the relationship between predictions and scorecard points to ascending/descending.
rounding (bool (default=False)) – Whether to round scorecard points. If scaling_method="min_max" a mixed-integer programming problem is solved to guarantee the minimum/maximum score after rounding. Otherwise, the scorecard points are round to the nearest integer.
verbose (bool (default=False)) – Enable verbose output.

binning_process_¶

The external binning process.

Type: object

estimator_¶

The external estimator fit on the reduced dataset.

Type: object

intercept_¶

The intercept if intercept_based=True.

Type: float

decision_function(X)¶

Predict confidence scores for samples. The confidence score for a sample is proportional to the signed distance of that sample to the hyperplane.

Parameters: X (pandas.DataFrame (n_samples, n_features)) – The data matrix for which we want to get the confidence scores.
Returns: scores – Confidence scores per (n_samples, n_classes) combination.
Return type: array of shape (n_samples, n_classes)

fit(X, y, sample_weight=None, metric_special=0, metric_missing=0, show_digits=2, check_input=False)¶

Fit scorecard.

Parameters

X (pandas.DataFrame (n_samples, n_features)) – Training vector, where n_samples is the number of samples.
y (array-like of shape (n_samples,)) – Target vector relative to x.
sample_weight (array-like of shape (n_samples,) (default=None)) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight. This option is only available for a binary target.
metric_special (float or str (default=0)) – The metric value to transform special codes in the input vector. Supported metrics are “empirical” to use the empirical WoE or event rate, and any numerical value.
metric_missing (float or str (default=0)) – The metric value to transform missing values in the input vector. Supported metrics are “empirical” to use the empirical WoE or event rate and any numerical value.
check_input (bool (default=False)) – Whether to check input arrays.
show_digits (int, optional (default=2)) – The number of significant digits of the bin column.

Returns

self – Fitted scorecard.

Return type

Scorecard

get_params(deep=True)¶

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

information(print_level=1)¶

Print overview information about the options settings and statistics.

Parameters: print_level (int (default=1)) – Level of details.

classmethod load(path)¶

Load scorecard from pickle file.

Parameters: path (str) – Pickle file path.

Example

>>> from optbinning import Scorecard
>>> scorecard = Scorecard.load("my_scorecard.pkl")

predict(X)¶

Predict using the fitted underlying estimator and the reduced dataset.

Parameters: X (pandas.DataFrame (n_samples, n_features)) – Training vector, where n_samples is the number of samples.
Returns: pred – The predicted target values.
Return type: array of shape (n_samples)

predict_proba(X)¶

Predict class probabilities using the fitted underlying estimator and the reduced dataset.

Parameters: X (pandas.DataFrame (n_samples, n_features)) – Training vector, where n_samples is the number of samples.
Returns: p – The class probabilities of the input samples.
Return type: array of shape (n_samples, n_classes)

save(path)¶

Save scorecard to pickle file.

Parameters: path (str) – Pickle file path.

score(X)¶

Score of the dataset.

Parameters: X (pandas.DataFrame (n_samples, n_features)) – Training vector, where n_samples is the number of samples.
Returns: score – The score of the input samples.
Return type: array of shape (n_samples)

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

table(style='summary')¶

Scorecard table.

Parameters: style (str, optional (default="summary")) – Scorecard’s style. Supported styles are “summary” and “detailed”. Summary only includes columns variable, bin description and points. Detailed contained additional columns with bin information and estimator coefficients.
Returns: table – The scorecard table.
Return type: pandas.DataFrame

Monitoring¶

class optbinning.scorecard.ScorecardMonitoring(scorecard, psi_method='cart', psi_n_bins=20, psi_min_bin_size=0.05, show_digits=2, verbose=False)¶

Bases: sklearn.base.BaseEstimator

Scorecard monitoring.

Parameters

scorecard (object) – A Scorecard fitted instance.
psi_method (str, optional (default="cart")) – The binning method to compute the Population Stability Index (PSI). Supported methods are “cart” for a CART decision tree, “quantile” to generate prebins with approximately same frequency and “uniform” to generate prebins with equal width. Method “cart” uses sklearn.tree.DecistionTreeClassifier.
psi_n_bins (int (default=20)) – The maximum number of bins to compute PSI.
psi_min_bin_size (float (default=0.05)) – The fraction of mininum number of records for PSI bin.
show_digits (int, optional (default=2)) – The number of significant digits of the bin column.
verbose (bool (default=False)) – Enable verbose output.

fit(X_actual, y_actual, X_expected, y_expected)¶

Fit monitoring with actual and expected data.

Parameters

X_actual (pandas.DataFrame) – New/actual/test data input samples.
y_actual (array-like of shape (n_samples,)) – Target vector relative to X actual.
X_expected (pandas.DataFrame) – Trainning data used for fitting the scorecard.
y_expected (array-like of shape (n_samples,)) – Target vector relative to X expected.

Returns

self – Fitted monitoring.

Return type

ScorecardMonitoring

get_params(deep=True)¶

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

information(print_level=1)¶

Print overview information about the options settings and statistics.

Parameters: print_level (int (default=1)) – Level of details.

psi_plot(savefig=None)¶

Plot Population Stability Index (PSI).

Parameters: savefig (str or None (default=None)) – Path to save the plot figure.

property psi_splits¶

List of splits points used to compute system PSI.

Returns: splits
Return type: numpy.ndarray

psi_table()¶

System Population Stability Index (PSI) table.

Returns: psi_table
Return type: pandas.DataFrame

psi_variable_table(name=None, style='summary')¶

Population Stability Index (PSI) at variable level.

Parameters

name (str or None (default=None)) – The variable name. If name is None, a table with all variables is returned.
style (str, optional (default="summary")) – Supported styles are “summary” and “detailed”. Summary only includes the total PSI for each variable. Detailed includes the PSI for each variable at bin level.

Returns

psi_table

Return type

pandas.DataFrame

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

system_stability_report()¶: Print overview information and statistics about system stability. It includes qualitative suggestions regarding the necessity of scorecard updates.

tests_table()¶

Compute statistical tests to determine if event rate (Chi-square test - binary target) or mean (Student’s t-test - continuous target) are significantly different. Null hypothesis (actual == expected).

Returns: tests_table
Return type: pandas.DataFrame

Plot functions¶

optbinning.scorecard.plot_auc_roc(y, y_pred, title=None, xlabel=None, ylabel=None, savefig=False, fname=None, **kwargs)¶

Plot Area Under the Receiver Operating Characteristic Curve (AUC ROC).

Parameters

y (array-like, shape = (n_samples,)) – Array with the target labels.
y_pred (array-like, shape = (n_samples,)) – Array with predicted probabilities.
title (str or None, optional (default=None)) – Title for the plot.
xlabel (str or None, optional (default=None)) – Label for the x-axis.
ylabel (str or None, optional (default=None)) – Label for the y-axis.
savefig (bool (default=False)) – Whether to save the figure.
fname (str or None, optional (default=None)) – Name for the figure file.
**kwargs (keyword arguments) – Keyword arguments for matplotlib.pyplot.savefig().

optbinning.scorecard.plot_cap(y, y_pred, title=None, xlabel=None, ylabel=None, savefig=False, fname=None, **kwargs)¶

Plot Cumulative Accuracy Profile (CAP).

Parameters

y (array-like, shape = (n_samples,)) – Array with the target labels.
y_pred (array-like, shape = (n_samples,)) – Array with predicted probabilities.
title (str or None, optional (default=None)) – Title for the plot.
xlabel (str or None, optional (default=None)) – Label for the x-axis.
ylabel (str or None, optional (default=None)) – Label for the y-axis.
savefig (bool (default=False)) – Whether to save the figure.
fname (str or None, optional (default=None)) – Name for the figure file.
**kwargs (keyword arguments) – Keyword arguments for matplotlib.pyplot.savefig().

optbinning.scorecard.plot_ks(y, y_pred, title=None, xlabel=None, ylabel=None, savefig=False, fname=None, **kwargs)¶

Plot Kolmogorov-Smirnov (KS).

Parameters

y (array-like, shape = (n_samples,)) – Array with the target labels.
y_pred (array-like, shape = (n_samples,)) – Array with predicted probabilities.
title (str or None, optional (default=None)) – Title for the plot.
xlabel (str or None, optional (default=None)) – Label for the x-axis.
ylabel (str or None, optional (default=None)) – Label for the y-axis.
savefig (bool (default=False)) – Whether to save the figure.
fname (str or None, optional (default=None)) – Name for the figure file.
**kwargs (keyword arguments) – Keyword arguments for matplotlib.pyplot.savefig().