A Hybrid Ensemble Method for Multiclass Classification and Outlier Detection

  • Dalton Ndirangu Lecturer, United States International University-Africa, P.O. Box 14634 00800, Nairobi, Kenya
  • Waweru Mwangi Professor, Jomo Kenyatta University of Agriculture and Technology, P.O. Box 62,000 – 00200 Nairobi, Kenya
  • Lawrence Nderu Lecturer, Jomo Kenyatta University of Agriculture and Technology, P.O. Box 62,000 – 00200 Nairobi, Kenya
Keywords: Multiclass, Outlier, Classification, Classifiers, Ensemble.

Abstract

Multiclass problem has continued to be an active research area due to the challenges paused by the issue of imbalance datasets and lack of a unifying classification algorithms. Real world problems are of multiclass nature with skewed representations. The study focused on the challenges of multiclass classification. Multiclass datasets were adopted from UCI machine learning repository. The research developed a heterogeneous ensemble model for multiclass classification and outlier detection that combined several strategies and ensemble techniques. Preprocessing involved filtering global outliers and resampling datasets using synthetic minority oversampling technique (SMOTE) algorithm. Datasets binarization was done using OnevsOne decomposing technique. Heterogeneous ensemble model was constructed using adaboost, random subspace algorithms and random forest as the base classifier. The classifiers built were combined using average of probabilities voting rule and evaluated using 10 fold stratified cross validation. The model showed better performance in terms of outlier detection and classification prediction for multiclass problem. The model outperformed other commonly used classical algorithms. The study findings established proper preprocessing and decomposing multiclass results in an improved performance of minority outlier classes while safe guarding integrity of the majority classes.

References

M. Elkano, M. Galar, J. Sanz , G. Lucca & H. Bustince. IVOVO: “A new interval-valued one-vs-one approach for multi-class classification problems”. In2017 Joint 17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems (IFSA-SCIS) 2017 Jun 27 (pp. 1-6). IEEE.

K. Malik, & H.G.S.K. Sadawarti, “ Comparative analysis of outlier detection techniques”. International Journal of Computer Applications. 2014 Jul;97(8):12-21.

S., Khalid , T. Khalil, & S. Nasreen, “A survey of feature selection and feature extraction techniques in machine learning”. In2014 Science and Information Conference 2014 Aug 27 (pp. 372-378). IEEE.

B., Seijo-Pardo, I. Porto-Díaz, V. Bolón-Canedo, &A. Alonso-Betanzos, “ Ensemble feature selection: homogeneous and heterogeneous approaches”. Knowledge-Based Systems. 2017 Feb 15;118:124-39.

O. Osanaiye, H. Cai, KK. Choo, A. Dehghantanha, Z. Xu, & M. Dlodlo, “Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing”. EURASIP Journal on Wireless Communications and Networking. 2016 Dec;2016(1):130.

N.V. Chawla, K.W. Bowyer, L.O. Hall, & W.P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique”. Journal of artificial intelligence research. 2002 Jun 1;16:321-57.

MY. Kiang, “A comparative assessment of classification methods”. Decision support systems. 2003 Jul 1;35(4):441-54.

TN. Phyu,. “Survey of classification techniques in data mining”. In Proceedings of the International Multi Conference of Engineers and Computer Scientists 2009 Mar 18 (Vol. 1, pp. 18-20).

S. Chen, G. Guo, &L. Chen, “A new over-sampling method based on cluster ensembles”. In 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops 2010 Apr 20 (pp. 599-604). IEEE.

MK. Khan, & A. Umer, “An Experimental Evaluation of Ensemble Methods for Pattern Classification”. In 2011 Third International Conference on Computational Intelligence, Communication Systems and Networks 2011 Jul 26 (pp. 6-10). IEEE.

S. Wang, & X. Yao, “Multiclass imbalance problems: Analysis and potential solutions”. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 2012 Aug;42(4):1119-30.

HY. Lin, “Efficient classifiers for multi-class classification problems”. Decision Support Systems. 2012 Jun 1;53(3):473-81.

R. Longadge, &S. Dongre, “Class imbalance problem in data mining review”. arXiv preprint arXiv:1305.1707. 2013 May 8.

MA. Bagheri, Q. Gao, &S. Escalera, “A framework towards the unification of ensemble classification methods”. In2013 12th International Conference on Machine Learning and Applications 2013 Dec 4 (Vol. 2, pp. 351-355). IEEE.

Y. Ming-hai, & W. Na, “Research on the ensemble learning classification algorithm based on the novel feature selection method”. In Proceedings of 2013 IEEE International Conference on Vehicular Electronics and Safety 2013 Jul 28 (pp. 263-267). IEEE.

U. Turhal, S. Babur, C. Avci, &A, Akbaş, “Performance improvement for diagnosis of colon cancer by using ensemble classification methods”. In 2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE) 2013 May 9 (pp. 271-275). IEEE.

K. Fawagreh, MM. Gaber, &E. Elyan, “Random forests: from early developments to recent advancements. Systems Science & Control Engineering”: An Open Access Journal. 2014 Dec 1;2(1):602-9.

A. Mert, N. Kılıç, &E. Bilgili, “Random subspace method with class separability weighting”. Expert Systems. 2016 Jun;33(3):275-85.

C. Gong, &L. Gu, “A novel SMOTE-based classification approach to online data imbalance problem”. Mathematical Problems in Engineering. 2016;2016.

I. Barandiaran, “The random subspace method for constructing decision forests”. IEEE Trans. Pattern Anal. Mach. Intell. 1998 Aug;20(8).

Q. Zhang, editor. Visual Analytics and Interactive Technologies: Data, Text and Web Mining Applications: Data, Text and Web Mining Applications. IGI Global; 2010 Oct 31.

K. Singh , & S. Upadhyaya, “Outlier detection: applications and techniques”. International Journal of Computer Science Issues (IJCSI). 2012;9(1):307.

J. Zhang, “Advancements of outlier detection: A survey”. ICST Transactions on Scalable Information Systems. 2013 Feb 4;13(1):1-26.

W. Feng, W. Huang, & J, Ren, “Class imbalance ensemble learning based on the margin theory”. Applied Sciences. 2018 May;8(5):815.

K. Chomboon, K. Kerdprasop, &N. Kerdprasop, “Rare class discovery techniques for highly imbalance data”. In Proc. International multi conference of engineers and computer scientists 2013 (Vol. 1).

N. Mehra, & S. Gupta , “Survey on multiclass classification methods”.

SS. Sreevidya, “Detection of outliers in data stream using clustering method”. Int J Sci Eng Technol Res. 2015;4(3):559-63.

L. Sunitha , M. BalRaju, J. Sasikiran, & EV. Ramana,” Automatic outlier identification in data mining using IQR in real-time data”. International Journal of Advanced Research in Computer and Communication Engineering. 2014;3(6):7255-7.

M. Cárdenas-Montes, MA. Vega-Rodríguez, JJ. Rodríguez-Vázquez, &A. Gómez-Iglesias,” A comparison exercise on parallel evaluation of rosenbrock function”. In Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation 2015 Jul 11 (pp. 1361-1362). ACM.

B. Liu , Y. Xiao , SY. Philip, Z. Hao, &L. Cao , “ An efficient approach for outlier detection with imperfect data labels”. IEEE transactions on knowledge and data engineering. 2014 Jul;26(7):1602-16.

SS. Rakhe, &AS. Vaidya,”A Survey on Different Unsupervised Techniques to Detect Outliers”. International Research Journal of Engineering and Technology (IRJET) Volume. 2015;2.

A. Christy, GM. Gandhi, &S. Vaithyasubramanian, “Cluster based outlier detection algorithm for healthcare data”. Procedia Computer Science. 2015 Jan 1;50:209-15.

M. Radovanović, A. Nanopoulos, &M. Ivanović, “ Reverse nearest neighbors in unsupervised distance-based outlier detection”. IEEE transactions on knowledge and data engineering. 2015 May 1;27(5):1369-82.

S. Kotsiantis, &D.Kanellopoulos, ”Combining bagging, boosting and random subspace ensembles for regression problems”. International Journal of Innovative Computing, Information and Control. 2012 Jun 1;8(6):3953-61.

K. Yan, X. You, X, Ji, G, Yin &F. Yang, “A hybrid outlier detection method for health care big data” . In 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom)(BDCloud-SocialCom-SustainCom) 2016 Oct 8 (pp. 157-162). IEEE.

R. Bansal, N. Gaur, & SN. Singh,” Outlier detection: applications and techniques in data mining”. In 2016 6th International Conference-Cloud System and Big Data Engineering (Confluence) 2016 Jan 14 (pp. 373-377). IEEE.

D. Singh , EJ. Leavline, “ Model-Based Outlier Detection System with Statistical Preprocessing”. Journal of Modern Applied Statistical Methods. 2016;15(1):39.

D. Dua, E. Karra Taniskidou ,” UCI Machine Learning Repository” [http://archive. ics. uci. edu/ml]. Irvine, CA: University of California. School of Information and Computer Science. 2017.

Published
2019-04-18
Section
Articles