Scorecard¶
-
class
optbinning.scorecard.
Scorecard
(binning_process, estimator, scaling_method=None, scaling_method_params=None, intercept_based=False, reverse_scorecard=False, rounding=False, verbose=False)¶ Bases:
optbinning.binning.base.Base
,sklearn.base.BaseEstimator
Scorecard development given a binary or continuous target dtype.
- Parameters
binning_process (object) – A
BinningProcess
instance.estimator (object) – A supervised learning estimator with a
fit
andpredict
method that provides information about feature coefficients through acoef_
attribute. For binary classification, the estimator must include apredict_proba
method.scaling_method (str or None (default=None)) – The scaling method to control the range of the scores. Supported methods are “pdo_odds” and “min_max”. Method “pdo_odds” is only applicable for binary classification. If None, no scaling is applied.
scaling_method_params (dict or None (default=None)) – Dictionary with scaling method parameters. If
scaling_method="pdo_odds"
parameters required are: “pdo”, “odds”, and “scorecard_points”. Ifscaling_method="min_max"
parameters required are “min” and “max”. Ifscaling_method=None
, this parameter is not used.intercept_based (bool (default=False)) – Build a intercept-based scorecard. A intercept-based scorecard modifies the original scorecard by setting the smallest point for each variable to zero and updating the intercept accordingly.
reverse_scorecard (bool (default=False)) – Whether to change the sense of the relationship between predictions and scorecard points to ascending/descending.
rounding (bool (default=False)) – Whether to round scorecard points. If
scaling_method="min_max"
a mixed-integer programming problem is solved to guarantee the minimum/maximum score after rounding. Otherwise, the scorecard points are round to the nearest integer.verbose (bool (default=False)) – Enable verbose output.
-
binning_process_
¶ The external binning process.
- Type
object
-
estimator_
¶ The external estimator fit on the reduced dataset.
- Type
object
-
intercept_
¶ The intercept if
intercept_based=True
.- Type
float
-
decision_function
(X)¶ Predict confidence scores for samples. The confidence score for a sample is proportional to the signed distance of that sample to the hyperplane.
- Parameters
X (pandas.DataFrame (n_samples, n_features)) – The data matrix for which we want to get the confidence scores.
- Returns
scores – Confidence scores per (n_samples, n_classes) combination.
- Return type
array of shape (n_samples, n_classes)
-
fit
(X, y, sample_weight=None, metric_special=0, metric_missing=0, show_digits=2, check_input=False)¶ Fit scorecard.
- Parameters
X (pandas.DataFrame (n_samples, n_features)) – Training vector, where n_samples is the number of samples.
y (array-like of shape (n_samples,)) – Target vector relative to x.
sample_weight (array-like of shape (n_samples,) (default=None)) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight. This option is only available for a binary target.
metric_special (float or str (default=0)) – The metric value to transform special codes in the input vector. Supported metrics are “empirical” to use the empirical WoE or event rate, and any numerical value.
metric_missing (float or str (default=0)) – The metric value to transform missing values in the input vector. Supported metrics are “empirical” to use the empirical WoE or event rate and any numerical value.
check_input (bool (default=False)) – Whether to check input arrays.
show_digits (int, optional (default=2)) – The number of significant digits of the bin column.
- Returns
self – Fitted scorecard.
- Return type
-
get_params
(deep=True)¶ Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
-
information
(print_level=1)¶ Print overview information about the options settings and statistics.
- Parameters
print_level (int (default=1)) – Level of details.
-
classmethod
load
(path)¶ Load scorecard from pickle file.
- Parameters
path (str) – Pickle file path.
Example
>>> from optbinning import Scorecard >>> scorecard = Scorecard.load("my_scorecard.pkl")
-
predict
(X)¶ Predict using the fitted underlying estimator and the reduced dataset.
- Parameters
X (pandas.DataFrame (n_samples, n_features)) – Training vector, where n_samples is the number of samples.
- Returns
pred – The predicted target values.
- Return type
array of shape (n_samples)
-
predict_proba
(X)¶ Predict class probabilities using the fitted underlying estimator and the reduced dataset.
- Parameters
X (pandas.DataFrame (n_samples, n_features)) – Training vector, where n_samples is the number of samples.
- Returns
p – The class probabilities of the input samples.
- Return type
array of shape (n_samples, n_classes)
-
save
(path)¶ Save scorecard to pickle file.
- Parameters
path (str) – Pickle file path.
-
score
(X)¶ Score of the dataset.
- Parameters
X (pandas.DataFrame (n_samples, n_features)) – Training vector, where n_samples is the number of samples.
- Returns
score – The score of the input samples.
- Return type
array of shape (n_samples)
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
-
table
(style='summary')¶ Scorecard table.
- Parameters
style (str, optional (default="summary")) – Scorecard’s style. Supported styles are “summary” and “detailed”. Summary only includes columns variable, bin description and points. Detailed contained additional columns with bin information and estimator coefficients.
- Returns
table – The scorecard table.
- Return type
pandas.DataFrame
Monitoring¶
-
class
optbinning.scorecard.
ScorecardMonitoring
(scorecard, psi_method='cart', psi_n_bins=20, psi_min_bin_size=0.05, show_digits=2, verbose=False)¶ Bases:
sklearn.base.BaseEstimator
Scorecard monitoring.
- Parameters
scorecard (object) – A
Scorecard
fitted instance.psi_method (str, optional (default="cart")) – The binning method to compute the Population Stability Index (PSI). Supported methods are “cart” for a CART decision tree, “quantile” to generate prebins with approximately same frequency and “uniform” to generate prebins with equal width. Method “cart” uses sklearn.tree.DecistionTreeClassifier.
psi_n_bins (int (default=20)) – The maximum number of bins to compute PSI.
psi_min_bin_size (float (default=0.05)) – The fraction of mininum number of records for PSI bin.
show_digits (int, optional (default=2)) – The number of significant digits of the bin column.
verbose (bool (default=False)) – Enable verbose output.
-
fit
(X_actual, y_actual, X_expected, y_expected)¶ Fit monitoring with actual and expected data.
- Parameters
X_actual (pandas.DataFrame) – New/actual/test data input samples.
y_actual (array-like of shape (n_samples,)) – Target vector relative to X actual.
X_expected (pandas.DataFrame) – Trainning data used for fitting the scorecard.
y_expected (array-like of shape (n_samples,)) – Target vector relative to X expected.
- Returns
self – Fitted monitoring.
- Return type
-
get_params
(deep=True)¶ Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
-
information
(print_level=1)¶ Print overview information about the options settings and statistics.
- Parameters
print_level (int (default=1)) – Level of details.
-
psi_plot
(savefig=None)¶ Plot Population Stability Index (PSI).
- Parameters
savefig (str or None (default=None)) – Path to save the plot figure.
-
property
psi_splits
¶ List of splits points used to compute system PSI.
- Returns
splits
- Return type
numpy.ndarray
-
psi_table
()¶ System Population Stability Index (PSI) table.
- Returns
psi_table
- Return type
pandas.DataFrame
-
psi_variable_table
(name=None, style='summary')¶ Population Stability Index (PSI) at variable level.
- Parameters
name (str or None (default=None)) – The variable name. If name is None, a table with all variables is returned.
style (str, optional (default="summary")) – Supported styles are “summary” and “detailed”. Summary only includes the total PSI for each variable. Detailed includes the PSI for each variable at bin level.
- Returns
psi_table
- Return type
pandas.DataFrame
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
-
system_stability_report
()¶ Print overview information and statistics about system stability. It includes qualitative suggestions regarding the necessity of scorecard updates.
-
tests_table
()¶ Compute statistical tests to determine if event rate (Chi-square test - binary target) or mean (Student’s t-test - continuous target) are significantly different. Null hypothesis (actual == expected).
- Returns
tests_table
- Return type
pandas.DataFrame
Plot functions¶
-
optbinning.scorecard.
plot_auc_roc
(y, y_pred, title=None, xlabel=None, ylabel=None, savefig=False, fname=None, **kwargs)¶ Plot Area Under the Receiver Operating Characteristic Curve (AUC ROC).
- Parameters
y (array-like, shape = (n_samples,)) – Array with the target labels.
y_pred (array-like, shape = (n_samples,)) – Array with predicted probabilities.
title (str or None, optional (default=None)) – Title for the plot.
xlabel (str or None, optional (default=None)) – Label for the x-axis.
ylabel (str or None, optional (default=None)) – Label for the y-axis.
savefig (bool (default=False)) – Whether to save the figure.
fname (str or None, optional (default=None)) – Name for the figure file.
**kwargs (keyword arguments) – Keyword arguments for matplotlib.pyplot.savefig().
-
optbinning.scorecard.
plot_cap
(y, y_pred, title=None, xlabel=None, ylabel=None, savefig=False, fname=None, **kwargs)¶ Plot Cumulative Accuracy Profile (CAP).
- Parameters
y (array-like, shape = (n_samples,)) – Array with the target labels.
y_pred (array-like, shape = (n_samples,)) – Array with predicted probabilities.
title (str or None, optional (default=None)) – Title for the plot.
xlabel (str or None, optional (default=None)) – Label for the x-axis.
ylabel (str or None, optional (default=None)) – Label for the y-axis.
savefig (bool (default=False)) – Whether to save the figure.
fname (str or None, optional (default=None)) – Name for the figure file.
**kwargs (keyword arguments) – Keyword arguments for matplotlib.pyplot.savefig().
-
optbinning.scorecard.
plot_ks
(y, y_pred, title=None, xlabel=None, ylabel=None, savefig=False, fname=None, **kwargs)¶ Plot Kolmogorov-Smirnov (KS).
- Parameters
y (array-like, shape = (n_samples,)) – Array with the target labels.
y_pred (array-like, shape = (n_samples,)) – Array with predicted probabilities.
title (str or None, optional (default=None)) – Title for the plot.
xlabel (str or None, optional (default=None)) – Label for the x-axis.
ylabel (str or None, optional (default=None)) – Label for the y-axis.
savefig (bool (default=False)) – Whether to save the figure.
fname (str or None, optional (default=None)) – Name for the figure file.
**kwargs (keyword arguments) – Keyword arguments for matplotlib.pyplot.savefig().