A Hybrid Ensemble Method for Multiclass Classification and Outlier Detection
AbstractMulticlass problem has continued to be an active research area due to the challenges paused by the issue of imbalance datasets and lack of a unifying classification algorithms. Real world problems are of multiclass nature with skewed representations. The study focused on the challenges of multiclass classification. Multiclass datasets were adopted from UCI machine learning repository. The research developed a heterogeneous ensemble model for multiclass classification and outlier detection that combined several strategies and ensemble techniques. Preprocessing involved filtering global outliers and resampling datasets using synthetic minority oversampling technique (SMOTE) algorithm. Datasets binarization was done using OnevsOne decomposing technique. Heterogeneous ensemble model was constructed using adaboost, random subspace algorithms and random forest as the base classifier. The classifiers built were combined using average of probabilities voting rule and evaluated using 10 fold stratified cross validation. The model showed better performance in terms of outlier detection and classification prediction for multiclass problem. The model outperformed other commonly used classical algorithms. The study findings established proper preprocessing and decomposing multiclass results in an improved performance of minority outlier classes while safe guarding integrity of the majority classes.
M. Elkano, M. Galar, J. Sanz , G. Lucca & H. Bustince. IVOVO: “A new interval-valued one-vs-one approach for multi-class classification problems”. In2017 Joint 17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems (IFSA-SCIS) 2017 Jun 27 (pp. 1-6). IEEE.
K. Malik, & H.G.S.K. Sadawarti, “ Comparative analysis of outlier detection techniques”. International Journal of Computer Applications. 2014 Jul;97(8):12-21.
S., Khalid , T. Khalil, & S. Nasreen, “A survey of feature selection and feature extraction techniques in machine learning”. In2014 Science and Information Conference 2014 Aug 27 (pp. 372-378). IEEE.
B., Seijo-Pardo, I. Porto-Díaz, V. Bolón-Canedo, &A. Alonso-Betanzos, “ Ensemble feature selection: homogeneous and heterogeneous approaches”. Knowledge-Based Systems. 2017 Feb 15;118:124-39.
O. Osanaiye, H. Cai, KK. Choo, A. Dehghantanha, Z. Xu, & M. Dlodlo, “Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing”. EURASIP Journal on Wireless Communications and Networking. 2016 Dec;2016(1):130.
N.V. Chawla, K.W. Bowyer, L.O. Hall, & W.P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique”. Journal of artificial intelligence research. 2002 Jun 1;16:321-57.
MY. Kiang, “A comparative assessment of classification methods”. Decision support systems. 2003 Jul 1;35(4):441-54.
TN. Phyu,. “Survey of classification techniques in data mining”. In Proceedings of the International Multi Conference of Engineers and Computer Scientists 2009 Mar 18 (Vol. 1, pp. 18-20).
S. Chen, G. Guo, &L. Chen, “A new over-sampling method based on cluster ensembles”. In 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops 2010 Apr 20 (pp. 599-604). IEEE.
MK. Khan, & A. Umer, “An Experimental Evaluation of Ensemble Methods for Pattern Classification”. In 2011 Third International Conference on Computational Intelligence, Communication Systems and Networks 2011 Jul 26 (pp. 6-10). IEEE.
S. Wang, & X. Yao, “Multiclass imbalance problems: Analysis and potential solutions”. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 2012 Aug;42(4):1119-30.
HY. Lin, “Efficient classifiers for multi-class classification problems”. Decision Support Systems. 2012 Jun 1;53(3):473-81.
R. Longadge, &S. Dongre, “Class imbalance problem in data mining review”. arXiv preprint arXiv:1305.1707. 2013 May 8.
MA. Bagheri, Q. Gao, &S. Escalera, “A framework towards the unification of ensemble classification methods”. In2013 12th International Conference on Machine Learning and Applications 2013 Dec 4 (Vol. 2, pp. 351-355). IEEE.
Y. Ming-hai, & W. Na, “Research on the ensemble learning classification algorithm based on the novel feature selection method”. In Proceedings of 2013 IEEE International Conference on Vehicular Electronics and Safety 2013 Jul 28 (pp. 263-267). IEEE.
U. Turhal, S. Babur, C. Avci, &A, Akbaş, “Performance improvement for diagnosis of colon cancer by using ensemble classification methods”. In 2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE) 2013 May 9 (pp. 271-275). IEEE.
K. Fawagreh, MM. Gaber, &E. Elyan, “Random forests: from early developments to recent advancements. Systems Science & Control Engineering”: An Open Access Journal. 2014 Dec 1;2(1):602-9.
A. Mert, N. Kılıç, &E. Bilgili, “Random subspace method with class separability weighting”. Expert Systems. 2016 Jun;33(3):275-85.
C. Gong, &L. Gu, “A novel SMOTE-based classification approach to online data imbalance problem”. Mathematical Problems in Engineering. 2016;2016.
I. Barandiaran, “The random subspace method for constructing decision forests”. IEEE Trans. Pattern Anal. Mach. Intell. 1998 Aug;20(8).
Q. Zhang, editor. Visual Analytics and Interactive Technologies: Data, Text and Web Mining Applications: Data, Text and Web Mining Applications. IGI Global; 2010 Oct 31.
K. Singh , & S. Upadhyaya, “Outlier detection: applications and techniques”. International Journal of Computer Science Issues (IJCSI). 2012;9(1):307.
J. Zhang, “Advancements of outlier detection: A survey”. ICST Transactions on Scalable Information Systems. 2013 Feb 4;13(1):1-26.
W. Feng, W. Huang, & J, Ren, “Class imbalance ensemble learning based on the margin theory”. Applied Sciences. 2018 May;8(5):815.
K. Chomboon, K. Kerdprasop, &N. Kerdprasop, “Rare class discovery techniques for highly imbalance data”. In Proc. International multi conference of engineers and computer scientists 2013 (Vol. 1).
N. Mehra, & S. Gupta , “Survey on multiclass classification methods”.
SS. Sreevidya, “Detection of outliers in data stream using clustering method”. Int J Sci Eng Technol Res. 2015;4(3):559-63.
L. Sunitha , M. BalRaju, J. Sasikiran, & EV. Ramana,” Automatic outlier identification in data mining using IQR in real-time data”. International Journal of Advanced Research in Computer and Communication Engineering. 2014;3(6):7255-7.
M. Cárdenas-Montes, MA. Vega-Rodríguez, JJ. Rodríguez-Vázquez, &A. Gómez-Iglesias,” A comparison exercise on parallel evaluation of rosenbrock function”. In Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation 2015 Jul 11 (pp. 1361-1362). ACM.
B. Liu , Y. Xiao , SY. Philip, Z. Hao, &L. Cao , “ An efficient approach for outlier detection with imperfect data labels”. IEEE transactions on knowledge and data engineering. 2014 Jul;26(7):1602-16.
SS. Rakhe, &AS. Vaidya,”A Survey on Different Unsupervised Techniques to Detect Outliers”. International Research Journal of Engineering and Technology (IRJET) Volume. 2015;2.
A. Christy, GM. Gandhi, &S. Vaithyasubramanian, “Cluster based outlier detection algorithm for healthcare data”. Procedia Computer Science. 2015 Jan 1;50:209-15.
M. Radovanović, A. Nanopoulos, &M. Ivanović, “ Reverse nearest neighbors in unsupervised distance-based outlier detection”. IEEE transactions on knowledge and data engineering. 2015 May 1;27(5):1369-82.
S. Kotsiantis, &D.Kanellopoulos, ”Combining bagging, boosting and random subspace ensembles for regression problems”. International Journal of Innovative Computing, Information and Control. 2012 Jun 1;8(6):3953-61.
K. Yan, X. You, X, Ji, G, Yin &F. Yang, “A hybrid outlier detection method for health care big data” . In 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom)(BDCloud-SocialCom-SustainCom) 2016 Oct 8 (pp. 157-162). IEEE.
R. Bansal, N. Gaur, & SN. Singh,” Outlier detection: applications and techniques in data mining”. In 2016 6th International Conference-Cloud System and Big Data Engineering (Confluence) 2016 Jan 14 (pp. 373-377). IEEE.
D. Singh , EJ. Leavline, “ Model-Based Outlier Detection System with Statistical Preprocessing”. Journal of Modern Applied Statistical Methods. 2016;15(1):39.
D. Dua, E. Karra Taniskidou ,” UCI Machine Learning Repository” [http://archive. ics. uci. edu/ml]. Irvine, CA: University of California. School of Information and Computer Science. 2017.
Authors who submit papers with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
- By submitting the processing fee, it is understood that the author has agreed to our terms and conditions which may change from time to time without any notice.
- It should be clear for authors that the Editor In Chief is responsible for the final decision about the submitted papers; have the right to accept\reject any paper. The Editor In Chief will choose any option from the following to review the submitted papers:A. send the paper to two reviewers, if the results were negative by one reviewer and positive by the other one; then the editor may send the paper for third reviewer or he take immediately the final decision by accepting\rejecting the paper. The Editor In Chief will ask the selected reviewers to present the results within 7 working days, if they were unable to complete the review within the agreed period then the editor have the right to resend the papers for new reviewers using the same procedure. If the Editor In Chief was not able to find suitable reviewers for certain papers then he have the right to accept\reject the paper.B. sends the paper to a selected editorial board member(s). C. the Editor In Chief himself evaluates the paper.
- Author will take the responsibility what so ever if any copyright infringement or any other violation of any law is done by publishing the research work by the author
- Before publishing, author must check whether this journal is accepted by his employer, or any authority he intends to submit his research work. we will not be responsible in this matter.
- If at any time, due to any legal reason, if the journal stops accepting manuscripts or could not publish already accepted manuscripts, we will have the right to cancel all or any one of the manuscripts without any compensation or returning back any kind of processing cost.
- The cost covered in the publication fees is only for online publication of a single manuscript.