期刊论文详细信息
Symmetry
Intelligent Detection of False Information in Arabic Tweets Utilizing Hybrid Harris Hawks Based Feature Selection and Machine Learning Models
Thaer Thaher1  Hamza Turabieh2  Hamouda Chantar3  Mahmoud Saheb4 
[1] Department of Engineering and Technology Sciences, Arab American University, P.O Box 240 Jenin, 13 Zababdeh, Palestine;Department of Information Technology, Collage of Computers and Information Technology, Taif University, P. O. Box 11099, Taif 21944, Saudi Arabia;Faculty of Information Technology, Sebha University, Sebha 18758, Libya;IT and Computer Engineering College, Palestine Polytechnic University, P.O Box 198 Hebron, Palestine;
关键词: false information;    natural language processing;    machine learning;    feature selection;    meta-heuristics;    Twitter;   
DOI  :  10.3390/sym13040556
来源: DOAJ
【 摘 要 】

Fake or false information on social media platforms is a significant challenge that leads to deliberately misleading users due to the inclusion of rumors, propaganda, or deceptive information about a person, organization, or service. Twitter is one of the most widely used social media platforms, especially in the Arab region, where the number of users is steadily increasing, accompanied by an increase in the rate of fake news. This drew the attention of researchers to provide a safe online environment free of misleading information. This paper aims to propose a smart classification model for the early detection of fake news in Arabic tweets utilizing Natural Language Processing (NLP) techniques, Machine Learning (ML) models, and Harris Hawks Optimizer (HHO) as a wrapper-based feature selection approach. Arabic Twitter corpus composed of 1862 previously annotated tweets was utilized by this research to assess the efficiency of the proposed model. The Bag of Words (BoW) model is utilized using different term-weighting schemes for feature extraction. Eight well-known learning algorithms are investigated with varying combinations of features, including user-profile, content-based, and words-features. Reported results showed that the Logistic Regression (LR) with Term Frequency-Inverse Document Frequency (TF-IDF) model scores the best rank. Moreover, feature selection based on the binary HHO algorithm plays a vital role in reducing dimensionality, thereby enhancing the learning model’s performance for fake news detection. Interestingly, the proposed BHHO-LR model can yield a better enhancement of 5% compared with previous works on the same dataset.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:5次