Outlier detection

class optbinning.binning.outlier.OutlierDetector

Bases: object

Base class for all outlier detectors.

fit(x)

Fit univariate outlier detector.

Parameters

x (array-like, shape = (n_samples)) –

Returns

self

Return type

OutlierDetector

get_support(indices=False)

Get a mask, or integer index, of the samples excluded, i.e, samples detected as outliers.

Parameters

indices (boolean (default False)) – If True, the return value will be an array of integers, rather than a boolean mask.

Returns

support – An index that selects the excluded samples from a vector. If indices is False, this is a boolean array, in which an element is True iff its corresponding sample is excluded. If indices is True, this is an integer array whose values are indices into the input vector.

Return type

array, shape = (n_samples)

class optbinning.binning.outlier.RangeDetector(interval_length=0.5, k=1.5, method='ETI')

Bases: sklearn.base.BaseEstimator, optbinning.binning.outlier.OutlierDetector

Interquartile range or interval based outlier detection method.

The default settings compute the usual interquartile range method.

Parameters
  • interval_length (float (default=0.5)) – Compute interval_length% credible interval. This is a value in [0, 1].

  • k (float (default=1.5)) – Tukey’s factor.

  • method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (method="HDI") and Equal-tailed interval (method="ETI").

fit(x)

Fit univariate outlier detector.

Parameters

x (array-like, shape = (n_samples)) –

Returns

self

Return type

OutlierDetector

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

get_support(indices=False)

Get a mask, or integer index, of the samples excluded, i.e, samples detected as outliers.

Parameters

indices (boolean (default False)) – If True, the return value will be an array of integers, rather than a boolean mask.

Returns

support – An index that selects the excluded samples from a vector. If indices is False, this is a boolean array, in which an element is True iff its corresponding sample is excluded. If indices is True, this is an integer array whose values are indices into the input vector.

Return type

array, shape = (n_samples)

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

class optbinning.binning.outlier.ModifiedZScoreDetector(threshold=3.5)

Bases: sklearn.base.BaseEstimator, optbinning.binning.outlier.OutlierDetector

Modified Z-score method.

Parameters

threshold (float (default=3.5)) – Modified Z-scores with an absolute value of greater than the threshold are labeled as outliers.

References

IH93

B. Iglewicz and D. Hoaglin. “Volume 16: How to Detect and Handle Outliers”, The ASQC Basic References in Quality Control: Statistical Techniques, Edward F. Mykytka, Ph.D., Editor, 1993.

fit(x)

Fit univariate outlier detector.

Parameters

x (array-like, shape = (n_samples)) –

Returns

self

Return type

OutlierDetector

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

get_support(indices=False)

Get a mask, or integer index, of the samples excluded, i.e, samples detected as outliers.

Parameters

indices (boolean (default False)) – If True, the return value will be an array of integers, rather than a boolean mask.

Returns

support – An index that selects the excluded samples from a vector. If indices is False, this is a boolean array, in which an element is True iff its corresponding sample is excluded. If indices is True, this is an integer array whose values are indices into the input vector.

Return type

array, shape = (n_samples)

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance