期刊论文详细信息
International Journal of Computer Science and Security
Testing Various Similarity Metrics and their Permutations with Clustering Approach in Context Free Data Cleaning
Paresh V Virparia1  Sohil Dineshkumar Pandya1 
[1] $$
关键词: Context free data cleaning;    Clustering;    Sequence similarity metrics;   
DOI  :  
来源: Computer Science and Security
PDF
【 摘 要 】

Organizations can sustain growth in this knowledge era by proficient data analysis, which heavily relies on quality of data. This paper emphasizes on usage of sequence similarity metric with clustering approach in context free data cleaning to improve the quality of data by reducing noise. Authors propose an algorithm to test suitability of value to correct other values of attribute based on distance between them. The sequence similarity metrics like Needlemen-Wunch, Jaro-Winkler, Chapman Ordered Name Similarity and Smith-Waterman are used to find distance of two values. Experimental results show that how the approach can effectively clean the data without reference data.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO201912040511475ZK.pdf 82KB PDF download
  文献评价指标  
  下载次数:13次 浏览次数:1次