期刊论文详细信息
Journal of Computer Science
Optimizing Feature Construction Process for Dynamic Aggregation of Relational Attributes | Science Publications
Rayner Alfred1 
关键词: Feature construction;    feature transformation;    data summarization;    genetic algorithm;    clustering;   
DOI  :  10.3844/jcssp.2009.864.877
学科分类:计算机科学(综合)
来源: Science Publications
PDF
【 摘 要 】

Problem statement: The importance of input representation has been recognized already in machine learning. Feature construction is one of the methods used to generate relevant features for learning data. This study addressed the question whether or not the descriptive accuracy of the DARA algorithm benefits from the feature construction process. In other words, this paper discusses the application of genetic algorithm to optimize the feature construction process to generate input data for the data summarization method called Dynamic Aggregation of Relational Attributes (DARA). Approach: The DARA algorithm was designed to summarize data stored in the non-target tables by clustering them into groups, where multiple records stored in non-target tables correspond to a single record stored in a target table. Here, feature construction methods are applied in order to improve the descriptive accuracy of the DARA algorithm. Since, the study addressed the question whether or not the descriptive accuracy of the DARA algorithm benefits from the feature construction process, the involved task includes solving the problem of constructing a relevant set of features for the DARA algorithm by using a genetic-based algorithm. Results:It is shown in the experimental results that the quality of summarized data is directly influenced by the methods used to create patterns that represent records in the (n×p) TF-IDF weighted frequency matrix. The results of the evaluation of the genetic-based feature construction algorithm showed that the data summarization results can be improved by constructing features by using the Cluster Entropy (CE) genetic-based feature construction algorithm. Conclusion: This study showed that the data summarization results can be improved by constructing features by using the cluster entropy genetic-based feature construction algorithm.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO201911300195057ZK.pdf 464KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:14次