学位论文详细信息
Integrating multiple conflicting sources by truth discovery and source quality estimation
Truth Discovery;Data Integration;Data Quality
Zhi, Shi ; Han ; Jiawei
关键词: Truth Discovery;    Data Integration;    Data Quality;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/50493/Shi_Zhi.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Multiple descriptions about the same entity from different sources will inevitably result in data or information inconsistency. Among conflicting pieces of information, which one is the most trustworthy? How to detect the fraudulence of a rumor? Obviously, it is unrealistic to curate and validate the trustworthiness of every piece of information because of the high cost of human labeling and lack of experts. To find the truth of each entity, much research work has shown that considering the quality of information providers can improve the performance of data integration. Due to different quality of data sources, it is hard to find a general solution that works for every case. Therefore, we start from a general setting of truth analysis at first and narrow down to two basic problems in data integration. We first propose a general framework to deal with numerical data with flexibility of defining loss function. Source quality is represented by a vector to model the source credibility in different error interval. Then we propose a new method called No Truth Truth Model(NTTM) to deal with truth existence problem in low-quality data. Preliminary experiments on real stock data and slot filling data show promising results.

【 预 览 】
附件列表
Files Size Format View
Integrating multiple conflicting sources by truth discovery and source quality estimation 1368KB PDF download
  文献评价指标  
  下载次数:14次 浏览次数:28次