Outlier detection

class optbinning.binning.outlier.OutlierDetector

Bases: object

Base class for all outlier detectors.

fit(x, y=None)

Fit outlier detector.

Parameters
  • x (array-like, shape = (n_samples)) –

  • y (array-like, shape = (n_samples) or None (default=None)) –

Returns

self

Return type

OutlierDetector

get_support(indices=False)

Get a mask, or integer index, of the samples excluded, i.e, samples detected as outliers.

Parameters

indices (boolean (default False)) – If True, the return value will be an array of integers, rather than a boolean mask.

Returns

support – An index that selects the excluded samples from a vector. If indices is False, this is a boolean array, in which an element is True iff its corresponding sample is excluded. If indices is True, this is an integer array whose values are indices into the input vector.

Return type

array, shape = (n_samples)

class optbinning.binning.outlier.RangeDetector(interval_length=0.5, k=1.5, method='ETI')

Bases: sklearn.base.BaseEstimator, optbinning.binning.outlier.OutlierDetector

Interquartile range or interval based outlier detection method.

The default settings compute the usual interquartile range method.

Parameters
  • interval_length (float (default=0.5)) – Compute interval_length% credible interval. This is a value in [0, 1].

  • k (float (default=1.5)) – Tukey’s factor.

  • method (str (default="ETI")) – Method to compute credible intervals. Supported methods are Highest Density interval (method="HDI") and Equal-tailed interval (method="ETI").

fit(x, y=None)

Fit outlier detector.

Parameters
  • x (array-like, shape = (n_samples)) –

  • y (array-like, shape = (n_samples) or None (default=None)) –

Returns

self

Return type

OutlierDetector

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

get_support(indices=False)

Get a mask, or integer index, of the samples excluded, i.e, samples detected as outliers.

Parameters

indices (boolean (default False)) – If True, the return value will be an array of integers, rather than a boolean mask.

Returns

support – An index that selects the excluded samples from a vector. If indices is False, this is a boolean array, in which an element is True iff its corresponding sample is excluded. If indices is True, this is an integer array whose values are indices into the input vector.

Return type

array, shape = (n_samples)

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

class optbinning.binning.outlier.ModifiedZScoreDetector(threshold=3.5)

Bases: sklearn.base.BaseEstimator, optbinning.binning.outlier.OutlierDetector

Modified Z-score method.

Parameters

threshold (float (default=3.5)) – Modified Z-scores with an absolute value of greater than the threshold are labeled as outliers.

References

IH93

B. Iglewicz and D. Hoaglin. “Volume 16: How to Detect and Handle Outliers”, The ASQC Basic References in Quality Control: Statistical Techniques, Edward F. Mykytka, Ph.D., Editor, 1993.

fit(x, y=None)

Fit outlier detector.

Parameters
  • x (array-like, shape = (n_samples)) –

  • y (array-like, shape = (n_samples) or None (default=None)) –

Returns

self

Return type

OutlierDetector

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

get_support(indices=False)

Get a mask, or integer index, of the samples excluded, i.e, samples detected as outliers.

Parameters

indices (boolean (default False)) – If True, the return value will be an array of integers, rather than a boolean mask.

Returns

support – An index that selects the excluded samples from a vector. If indices is False, this is a boolean array, in which an element is True iff its corresponding sample is excluded. If indices is True, this is an integer array whose values are indices into the input vector.

Return type

array, shape = (n_samples)

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

class optbinning.binning.outlier.YQuantileDetector(outlier_detector='zscore', outlier_params=None, n_bins=5)

Bases: sklearn.base.BaseEstimator, optbinning.binning.outlier.OutlierDetector

Outlier detector on the y-axis over quantiles.

Parameters
  • outlier_detector (str or None, optional (default=None)) – The outlier detection method. Supported methods are “range” to use the interquartile range based method or “zcore” to use the modified Z-score method.

  • outlier_params (dict or None, optional (default=None)) – Dictionary of parameters to pass to the outlier detection method.

  • n_bins (int (default=5)) – The maximum number of bins to consider.

fit(x, y=None)

Fit outlier detector.

Parameters
  • x (array-like, shape = (n_samples)) –

  • y (array-like, shape = (n_samples) or None (default=None)) –

Returns

self

Return type

OutlierDetector

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

get_support(indices=False)

Get a mask, or integer index, of the samples excluded, i.e, samples detected as outliers.

Parameters

indices (boolean (default False)) – If True, the return value will be an array of integers, rather than a boolean mask.

Returns

support – An index that selects the excluded samples from a vector. If indices is False, this is a boolean array, in which an element is True iff its corresponding sample is excluded. If indices is True, this is an integer array whose values are indices into the input vector.

Return type

array, shape = (n_samples)

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance