MDLP discretization algorithm¶
-
class
optbinning.
MDLP
(min_samples_split=2, min_samples_leaf=2, max_candidates=32)¶ Bases:
sklearn.base.BaseEstimator
Minimum Description Length Principle (MDLP) discretization algorithm.
- Parameters
min_samples_split (int (default=2)) – The minimum number of samples required to split an internal node.
min_samples_leaf (int (default=2)) – The minimum number of samples required to be at a leaf node.
max_candidates (int (default=32)) – The maximum number of split points to evaluate at each partition.
Notes
Implementation of the discretization algorithm in [FI93]. A dynamic split strategy based on binning the number of candidate splits [CMR2001] is implemented to increase efficiency. For large size datasets, it is recommended to use a smaller
max_candidates
(e.g. 16) to get a significant speed up.References
- FI93
U. M. Fayyad and K. B. Irani. “Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning”. International Joint Conferences on Artificial Intelligence, 13:1022–1027, 1993.
- CMR2001
D. M. Chickering, C. Meek and R. Rounthwaite. “Efficient Determination of Dynamic Split Points in a Decision Tree”. In Proceedings of the 2001 IEEE International Conference on Data Mining, 91-98, 2001.
-
fit
(x, y)¶ Fit MDLP discretization algorithm.
- Parameters
x (array-like, shape = (n_samples)) – Data samples, where n_samples is the number of samples.
y (array-like, shape = (n_samples)) – Target vector relative to x.
- Returns
self
- Return type
-
get_params
(deep=True)¶ Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
-
property
splits
¶ List of split points
- Returns
splits
- Return type
numpy.ndarray