Decision Tree’s Features of Application in Classification Problems

UDC 004.855.5

I.L. Kaftannikov, South Ural State University, Chelyabinsk, Russian Federation, kil@is74.ru

A.V. Parasich, South Ural State University, Chelyabinsk, Russian Federation, parasich_av@yandex.ru

Abstract

The article describes the application of decision trees in classification problems. In recent years, decision trees are widely used for computer vision tasks, including object recognition, text classification, gesture recognition, spam detection, training in ranking for information search, semantic segmentation and data clustering. This is facilitated by such distinctive features as interpretability, controllability and an automatic feature selection. However, there are number of fundamental shortcomings, due to which the problem of decision trees learning becomes much more complicated. The article provides the analysis of advantages and disadvantages of decision trees, the issues of decision trees learning and testing are considered. Particular attention is given to balance of training dataset. We also consider the decision forests and methods of its learning. A brief overview of methods for reducing errors interdependence of decision trees in decision forests learning is given. Methods for overcoming of drawbacks of decision trees are offered, results of these methods are proposed.

Full text

Keywords

decision trees, decision forests, machine learning, classification

References

Breiman L. Random Forests. Machine Learning, 2001, vol. 45(1), pp. 5–32. DOI: 10.1023/A:1010933404324
Breiman L. Bagging Predictors. Machine Learning, 1996, vol. 24, no. 2, pp. 123–140. DOI: 10.1007/BF00058655
Freund Y, Schapire R.E. Experiments with a New Boosting Algorithm. International Conference on Machine Learning, 1996, pp. 148–156.
Matsenov A.A. Komitetnyy busting: minimizatsiya chisla bazovykh algoritmov pri prostom golosovanii (Committee Boosting: Number of Base Algorithms Minimization for Simple Voting). Vserossiyskaya konferentsiya MMRO-13 (All-Russian Conference MMRO-13). St. Peterburg, 2007, pp. 180–183.
Mason L., Bartlett P., Baxter J. Direct Optimization of Margins Improves Generalization in Combined Classifiers. Proc. of the 1998 conf. on Advances in Neural Information Processing Systems II, MIT Press, 1999, pp. 288–294.

Source

Bulletin of the South Ural State University. Ser. Computer Technologies, Automatic Control, Radio Electronics, 2015, vol. 15, no. 3, pp. 26-32. (in Russ.) (Computer Science and Engineering)