MDLP discretization algorithm¶

class
optbinning.
MDLP
(min_samples_split=2, min_samples_leaf=2, max_candidates=32)¶ Bases:
sklearn.base.BaseEstimator
Minimum Description Length Principle (MDLP) discretization algorithm.
 Parameters
min_samples_split (int (default=2)) – The minimum number of samples required to split an internal node.
min_samples_leaf (int (default=2)) – The minimum number of samples required to be at a leaf node.
max_candidates (int (default=32)) – The maximum number of split points to evaluate at each partition.
Notes
Implementation of the discretization algorithm in [FI93]. A dynamic split strategy based on binning the number of candidate splits [CMR2001] is implemented to increase efficiency. For large size datasets, it is recommended to use a smaller
max_candidates
(e.g. 16) to get a significant speed up.References
 FI93
U. M. Fayyad and K. B. Irani. “MultiInterval Discretization of ContinuousValued Attributes for Classification Learning”. International Joint Conferences on Artificial Intelligence, 13:1022–1027, 1993.
 CMR2001
D. M. Chickering, C. Meek and R. Rounthwaite. “Efficient Determination of Dynamic Split Points in a Decision Tree”. In Proceedings of the 2001 IEEE International Conference on Data Mining, 9198, 2001.

fit
(x, y)¶ Fit MDLP discretization algorithm.
 Parameters
x (arraylike, shape = (n_samples)) – Data samples, where n_samples is the number of samples.
y (arraylike, shape = (n_samples)) – Target vector relative to x.
 Returns
self
 Return type

get_params
(deep=True)¶ Get parameters for this estimator.
 Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
 Returns
params – Parameter names mapped to their values.
 Return type
dict

set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object. Parameters
**params (dict) – Estimator parameters.
 Returns
self – Estimator instance.
 Return type
estimator instance

property
splits
¶ List of split points
 Returns
splits
 Return type
numpy.ndarray