期刊论文详细信息
IEEE Access
Context-Aware Misinformation Detection: A Benchmark of Deep Learning Architectures Using Word Embeddings
Ciprian-Octavian Truica1  Vlad-Iulian Ilie1  Elena-Simona Apostol1  Adrian Paschke2 
[1] Computer Science and Engineering Department, Faculty of Automatic Control and Computers, University POLITEHNICA of Bucharest, Bucharest, Romania;Fraunhofer Institute for Open Communication Systems (FOKUS), Berlin, Germany;
关键词: Misinformation detection;    deep learning;    multi-class text classification;    word embeddings;    text preprocessing;    benchmarking;   
DOI  :  10.1109/ACCESS.2021.3132502
来源: DOAJ
【 摘 要 】

New mass media paradigms for information distribution have emerged with the digital age. With new digital-enabled mass media, the communication process is centered around the user, while multimedia content is the new identity of news. Thus, the media landscape has shifted from mass media to personalized social media. While this progress brings advantages, it also carries the risk of being detrimental to society through the emergence of misinformation (false or inaccurate information) and disinformation (intentionally spreading misinformation) in the form of fake news. Fake news is a tool used to manipulate public opinion on particular topics, distort public perceptions, and generate social unrest while lacking the rigor of traditional journalism. Driven by this current and real-world problem, in this paper, we train multiple Deep Learning architectures for multi-class classification and compare their performance in detecting the veracity of the news articles. To achieve accurate models in detecting misinformation, we employ a large dataset containing 100 000 news articles labeled with ten classes (one with real news and the rest with different types of fake news). We use two preprocessing techniques, i.e., one simple and another very aggressive, to clean the dataset. We also employ three word embeddings that preserve the word context, i.e., Word2Vec, FastText, and GloVe, pre-trained and trained on our dataset to vectorize the preprocessed dataset. For the misinformation task, we train a Logistic Regression as a baseline and compare its results with the performance of ten Deep Learning architectures. We obtain the best results using a Recurrent Convolutional Neural Network based architecture. The experimental results show that the models are highly dependable on text preprocessing and the word embedding employed.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次