Multiple descriptions about the same entity from different sources will inevitably result in data or information inconsistency. Among conflicting pieces of information, which one is the most trustworthy? How to detect the fraudulence of a rumor? Obviously, it is unrealistic to curate and validate the trustworthiness of every piece of information because of the high cost of human labeling and lack of experts. To find the truth of each entity, much research work has shown that considering the quality of information providers can improve the performance of data integration. Due to different quality of data sources, it is hard to find a general solution that works for every case. Therefore, we start from a general setting of truth analysis at first and narrow down to two basic problems in data integration. We first propose a general framework to deal with numerical data with flexibility of defining loss function. Source quality is represented by a vector to model the source credibility in different error interval. Then we propose a new method called No Truth Truth Model(NTTM) to deal with truth existence problem in low-quality data. Preliminary experiments on real stock data and slot filling data show promising results.
【 预 览 】
附件列表
Files
Size
Format
View
Integrating multiple conflicting sources by truth discovery and source quality estimation