科技报告详细信息
On the Helmholtz Principle for Data Mining
Balinsky, Alexander ; Balinsky, Helen ; Simske, Steven
HP Development Company
关键词: extraction;    feature extraction;    unusual behavior detection;    Helmholtz principle;    mining textual and unstructured datasets;   
RP-ID  :  HPL-2010-133
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】
We present novel algorithms for feature extraction and change detection in unstructured data, primarily in textual and sequential data. Keyword and feature extraction is a fundamental problem in text data mining and document processing. A majority of document processing applications directly depend on the quality and speed of keyword extraction algorithms. In this article, a novel approach to rapid change detection in data streams and documents is developed. It is based on ideas from image processing and especially on the Helmholtz Principle from the Gestalt Theory of human perception. Applied to the problem of keywords extraction, it delivers fast and effective tools to identify meaningful keywords using parameter-free methods. We also define a level of meaningfulness of the keywords which can be used to modify the set of keywords depending on application needs.
【 预 览 】
附件列表
Files Size Format View
RO201804100002724LZ 704KB PDF download
  文献评价指标  
  下载次数:12次 浏览次数:49次