2nd Annual International Conference on Information System and Artificial Intelligence | |
A pruning algorithm for Meta-blocking based on cumulative weight | |
物理学;计算机科学 | |
Zhang, Fulin^1 ; Gao, Zhipeng^1 ; Niu, Kun^2 | |
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing | |
100876, China^1 | |
School of Software Engineering, Beijing University of Posts and Telecommunications, Beijing | |
100876, China^2 | |
关键词: Blocking method; Blocking process; Cumulative weight; Entity resolutions; Heterogeneous information; Large datasets; Pruning algorithms; Quadratic complexity; | |
Others : https://iopscience.iop.org/article/10.1088/1742-6596/887/1/012058/pdf DOI : 10.1088/1742-6596/887/1/012058 |
|
学科分类:计算机科学(综合) | |
来源: IOP | |
【 摘 要 】
Entity Resolution is an important process in data cleaning and data integration. It usually employs a blocking method to avoid the quadratic complexity work when scales to large data sets. Meta-blocking can perform better in the context of highly heterogeneous information spaces. Yet, its precision and efficiency still have room to improve. In this paper, we present a new pruning algorithm for Meta-Blocking. It can achieve a higher precision than the existing WEP algorithm at a small cost of recall. In addition, can reduce the runtime of the blocking process. We evaluate our proposed method over five real-world data sets.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
A pruning algorithm for Meta-blocking based on cumulative weight | 468KB | download |