IEEE Access | |
An Imbalanced Big Data Mining Framework for Improving Optimization Algorithms Performance | |
Ali Ibrahim El-Desouky1  Eslam Mohsen Hassib1  El-Sayed M. El-Kenawy2  Sally M. El-Ghamrawy3  | |
[1] Computer Engineering and Systems Department, Faculty of Engineering, Mansoura University, Mansoura, Egypt;Department of Computer and Systems Engineering, Delta Higher Institute for Engineering and Technology (DHIET), Mansoura, Egypt;Head of Communications and Computer Engineering Department, MISR Higher Institute for Engineering and Technology, Mansoura, Egypt; | |
关键词: Grey wolf optimizer; neural network; big data mining; deep learning; imbalanced data sets; optimization; | |
DOI : 10.1109/ACCESS.2019.2955983 | |
来源: DOAJ |
【 摘 要 】
Big data is an important factor almost in all nowadays technologies, such as, social media, smart cities, and internet of things. Most of standard classifiers tends to be trapped in local optima problem when dealing with such massive datasets. Hence, investigating new techniques for dealing with such massive data sets is required. This paper presents a novel imbalanced big data mining framework for improving optimization algorithms performance by eliminating the local optima problem consists of three main stages. Firstly, the preprocessing stage, which uses the LSH-SMOTE algorithm for solving the class imbalance problem, then it uses the LSH algorithm for hashing the data set instances into buckets. Secondly, the bucket search stage, which uses the GWO for training bidirectional recurrent neural network BRNN and searching for the global optimum in each bucket. Lastly, the final tournament winner stage, which uses the GWO+BRNN for finding the global optimum of the whole data set among all global optimums from all buckets. Our proposed framework LSHGWOBRNN has been tested over 9 data sets one of them is big data set in terms of AUC, MSE, against seven well-known machine-learning algorithms (Naive Bayes, Random Tree, Decision Table, and AdaBoostM1, WOA+MLP, GWO+MLP, and WOA+BRNN), then, we tested our algorithm over four well-known data sets against GWO+MLP, ACO+MLP, GA+MLP, PSO+MLP, PBIL+MLP, and ES+MLP in terms of classification accuracy and MSE. Our experimental results have proved that our proposed framework LSHGWOBRNN has provided high local optima avoidance, and higher accuracy, less complexity and overhead.
【 授权许可】
Unknown