Outlier detection¶
-
class
optbinning.binning.outlier.
OutlierDetector
¶ Bases:
object
Base class for all outlier detectors.
-
fit
(x, y=None)¶ Fit outlier detector.
- Parameters
x (array-like, shape = (n_samples)) –
y (array-like, shape = (n_samples) or None (default=None)) –
- Returns
self
- Return type
-
get_support
(indices=False)¶ Get a mask, or integer index, of the samples excluded, i.e, samples detected as outliers.
- Parameters
indices (boolean (default False)) – If True, the return value will be an array of integers, rather than a boolean mask.
- Returns
support – An index that selects the excluded samples from a vector. If indices is False, this is a boolean array, in which an element is True iff its corresponding sample is excluded. If indices is True, this is an integer array whose values are indices into the input vector.
- Return type
array, shape = (n_samples)
-
-
class
optbinning.binning.outlier.
RangeDetector
(interval_length=0.5, k=1.5, method='ETI')¶ Bases:
sklearn.base.BaseEstimator
,optbinning.binning.outlier.OutlierDetector
Interquartile range or interval based outlier detection method.
The default settings compute the usual interquartile range method.
- Parameters
interval_length (float (default=0.5)) – Compute
interval_length
% credible interval. This is a value in [0, 1].k (float (default=1.5)) – Tukey’s factor.
method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (
method="HDI"
) and Equal-tailed interval (method="ETI"
).
-
fit
(x, y=None)¶ Fit outlier detector.
- Parameters
x (array-like, shape = (n_samples)) –
y (array-like, shape = (n_samples) or None (default=None)) –
- Returns
self
- Return type
-
get_params
(deep=True)¶ Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
-
get_support
(indices=False)¶ Get a mask, or integer index, of the samples excluded, i.e, samples detected as outliers.
- Parameters
indices (boolean (default False)) – If True, the return value will be an array of integers, rather than a boolean mask.
- Returns
support – An index that selects the excluded samples from a vector. If indices is False, this is a boolean array, in which an element is True iff its corresponding sample is excluded. If indices is True, this is an integer array whose values are indices into the input vector.
- Return type
array, shape = (n_samples)
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
-
class
optbinning.binning.outlier.
ModifiedZScoreDetector
(threshold=3.5)¶ Bases:
sklearn.base.BaseEstimator
,optbinning.binning.outlier.OutlierDetector
Modified Z-score method.
- Parameters
threshold (float (default=3.5)) – Modified Z-scores with an absolute value of greater than the threshold are labeled as outliers.
References
- IH93
B. Iglewicz and D. Hoaglin. “Volume 16: How to Detect and Handle Outliers”, The ASQC Basic References in Quality Control: Statistical Techniques, Edward F. Mykytka, Ph.D., Editor, 1993.
-
fit
(x, y=None)¶ Fit outlier detector.
- Parameters
x (array-like, shape = (n_samples)) –
y (array-like, shape = (n_samples) or None (default=None)) –
- Returns
self
- Return type
-
get_params
(deep=True)¶ Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
-
get_support
(indices=False)¶ Get a mask, or integer index, of the samples excluded, i.e, samples detected as outliers.
- Parameters
indices (boolean (default False)) – If True, the return value will be an array of integers, rather than a boolean mask.
- Returns
support – An index that selects the excluded samples from a vector. If indices is False, this is a boolean array, in which an element is True iff its corresponding sample is excluded. If indices is True, this is an integer array whose values are indices into the input vector.
- Return type
array, shape = (n_samples)
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
-
class
optbinning.binning.outlier.
YQuantileDetector
(outlier_detector='zscore', outlier_params=None, n_bins=5)¶ Bases:
sklearn.base.BaseEstimator
,optbinning.binning.outlier.OutlierDetector
Outlier detector on the y-axis over quantiles.
- Parameters
outlier_detector (str or None, optional (default=None)) – The outlier detection method. Supported methods are “range” to use the interquartile range based method or “zcore” to use the modified Z-score method.
outlier_params (dict or None, optional (default=None)) – Dictionary of parameters to pass to the outlier detection method.
n_bins (int (default=5)) – The maximum number of bins to consider.
-
fit
(x, y=None)¶ Fit outlier detector.
- Parameters
x (array-like, shape = (n_samples)) –
y (array-like, shape = (n_samples) or None (default=None)) –
- Returns
self
- Return type
-
get_params
(deep=True)¶ Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
-
get_support
(indices=False)¶ Get a mask, or integer index, of the samples excluded, i.e, samples detected as outliers.
- Parameters
indices (boolean (default False)) – If True, the return value will be an array of integers, rather than a boolean mask.
- Returns
support – An index that selects the excluded samples from a vector. If indices is False, this is a boolean array, in which an element is True iff its corresponding sample is excluded. If indices is True, this is an integer array whose values are indices into the input vector.
- Return type
array, shape = (n_samples)
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance