期刊论文详细信息
Journal of Multimedia
Study of Chinese Text Similarity Based on Difference Factor in Word-Number
关键词: Segmentation;    Factor in Word-Number;    Similarity;   
Others  :  945820
DOI  :  10.4304/jmm.9.7.865-872
PDF
【 摘 要 】
Text similarity calculation is the basic work in the application of Chinese information processing. A high-quality text similarity calculation method must be accurate and efficient, that is, it can be able to compare texts from the level of text natural language meaning, and arrive at the similarity distinction similar to artificial reading based on a full understanding of the author or text source semantic. At the same time, it should also be an efficient algorithm to save the processing time in facing large amount of text information to be processed. Through the research of many domestic and foreign literature, analysis and further research on current situation of similarity calculation, this paper intended to present a new method to improve the performance of similarity calculation, namely a Chinese text similarity algorithm based on word-number difference, which combined the traditional based on statistics and the narrow semantic method that meant the combination of the statistical efficiency and semantic accuracy. Combining the advantages of statistics and semantic category also means the necessity to face and overcome disadvantages of the two kinds of methods. This paper attempted to take the difference in word-number as the breakthrough point, took advantage of the diversity of Chinese word-number, combining with the word frequency, number and meaning, in order to successfully extend the word similarity calculation to the text similarity calculation. Finally, introduced the self built small text set as test object, compared similarity calculation of different methods in the laboratory environment. It shows that the similarity calculation method based on difference in word-number performances better than the traditional methods based on statistical and semantic. Through artificial comparison of the test results of research on this topic in accuracy and speed of segmentation, provide a new approach for Chinese text similarity calculation
【 授权许可】

   
@ 2006-2014 by ACADEMY PUBLISHER – All rights reserved.

【 预 览 】
附件列表
Files Size Format View
20140819181403390.pdf 846KB PDF download
  文献评价指标  
  下载次数:2次 浏览次数:12次