期刊论文详细信息
Annals of Emerging Technologies in Computing
Similarity Detection of Time-Sensitive Online News Articles Based on RSS Feeds and Contextual Data
article
Daoud, Mohammad1 
[1]American University of Madaba
关键词: Arabic NLP;    News aggregators;    Recommendation systems;    Semantic similarity;    Web personalization;   
DOI  :  10.33166/AETiC.2023.01.006
学科分类:电子与电气工程
来源: International Association for Educators and Researchers (IAER)
PDF
【 摘 要 】
This article tackles the problem of finding similarity between web time-sensitive news articles, which can be a challenge. This challenge was approached with a novel methodology that uses supervised learning algorithms with carefully selected features (Semantic, Lexical and Temporal features (content and contextual features)). The proposed approach considers not only the textual content, which is a well-studied approach that may yield misleading results, but also the context, community engagement, and community-deduced importance of that news article. This paper details the major procedures of title pair pre-processing, analysis of lexical units, feature engineering, and similarity measures. Thousands of web articles are being published every second, and therefore, it is essential to determine the similarity of these articles efficiently without wasting time on unnecessary text processing of the bodies. Hence, the proposed approach focuses on short contents (titles) and context. The conducted experiment showed high precision and accuracy on a Really Simple Syndication (RSS) dataset of 8000 Arabic news article pairs collected automatically from 10 different news sources. The proposed approach achieved an accuracy of 0.81. Contextual features increased the accuracy and the precision. The proposed algorithm achieved a 0.89 correlation with the evaluations of two human judges based on Pearson’s Correlation Coefficient. The results outperform the state-of-the-art systems on Arabic news articles.
【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202306300002727ZK.pdf 628KB PDF download
  文献评价指标  
  下载次数:3次 浏览次数:2次