期刊论文详细信息
IEEE Access
A MapReduce Opinion Mining for COVID-19-Related Tweets Classification Using Enhanced ID3 Decision Tree Classifier
Fatima Es-Sabery1  Abdellatif Hair1  Khadija Es-Sabery2  Junaid Qadir3  Beatriz Sainz-De-Abajo4  Isabel De La Torre-Diez4  Begona Garcia-Zapirain5 
[1] Department of Computer Science, Faculty of Sciences and Technology, Sultan Moulay Slimane University, Beni Mellal, Morocco;Department of Computer Science, National School of Applied Sciences, Cadi Ayyad University, Marrakech, Morocco;Department of Electronics, Quaid-i-Azam University, Islamabad, Pakistan;Department of Signal Theory, Communications and Telematics Engineering, University of Valladolid, Valladolid, Spain;eVIDA Research Group, University of Deusto, Bilbao, Spain;
关键词: ID3 decision tree;    opinion mining;    Hadoop;    HDFS;    MapReduce;    feature extractors;   
DOI  :  10.1109/ACCESS.2021.3073215
来源: DOAJ
【 摘 要 】

Opinion Mining (OM) is a field of Natural Language Processing (NLP) that aims to capture human sentiment in the given text. With the ever-spreading of online purchasing websites, micro-blogging sites, and social media platforms, OM in online social media platforms has picked the interest of thousands of scientific researchers. Because the reviews, tweets and blogs acquired from these social media networks, act as a significant source for enhancing the decision making process. The obtained textual data (reviews, tweets, or blogs) are classified into three different class labels which are negative, neutral and positive for analyzing and extracting relevant information from the given dataset. In this contribution, we introduce an innovative MapReduce improved weighted ID3 decision tree classification approach for OM, which consists mainly of three aspects: Firstly We have used several feature extractors to efficiently detect and capture the relevant data from the given tweets, including N-grams or character-level, Bag-Of-Words, word embedding (GloVe, Word2Vec), FastText, and TF-IDF. Secondly, we have applied a multiple feature selector to reduce the high feature’s dimensionality, including Chi-square, Gain Ratio, Information Gain, and Gini Index. Finally, we have employed the obtained features to carry out the classification task using an improved ID3 decision tree classifier, which aims to calculate the weighted information gain instead of information gain used in traditional ID3. In other words, to measure the weighted information gain for the current conditioned feature, we follow two steps: First, we compute the weighted correlation function of the current conditioned feature. Second, we multiply the obtained weighted correlation function by the information gain of this current conditioned feature. This work is implemented in a distributed environment using the Hadoop framework, with its programming framework MapReduce and its distributed file system HDFS. Its primary goal is to enhance the performance of a well-known ID3 classifier in terms of accuracy, execution time, and ability to handle the massive datasets. We have carried out several experiences that aims to assess the effectiveness of our suggested classifier compared to some other contributions chosen from the literature. The experimental results demonstrated that our ID3 classifier works better on COVID-19_Sentiments dataset than other classifiers in terms of Recall (85.72 %), specificity (86.51 %), error rate (11.18 %), false-positive rate (13.49 %), execution time (15.95s), kappa statistic (87.69 %), F1-score (85.54 %), classification rate (88.82 %), false-negative rate (14.28 %), precision rate (86.67 %), convergence (it convergent towards the iteration 90), stability (it is more stable with mean deviation standard equal to 0.12 %), and complexity (it requires much lower time and space computational complexity).

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:3次