期刊论文详细信息
Entropy
A Feature Subset Selection Method Based On High-Dimensional Mutual Information
Yun Zheng1 
[1] 1Institute of Developmental Biology and Molecular Medicine, Fudan University, 220 Handan Road, Shanghai 200433, China 2School of Life Sciences, Fudan University, 220 Handan Road, Shanghai 200433, China 3School of Computer Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore
关键词: feature selection;    mutual information;    Entropy;    information theory;    Markov blanket;    classification;   
DOI  :  10.3390/e13040860
来源: mdpi
PDF
【 摘 要 】

Feature selection is an important step in building accurate classifiers and provides better understanding of the data sets. In this paper, we propose a feature subset selection method based on high-dimensional mutual information. We also propose to use the entropy of the class attribute as a criterion to determine the appropriate subset of features when building classifiers. We prove that if the mutual information between a feature set X and the class attribute Y equals to the entropy of Y , then X is a Markov Blanket of Y . We show that in some cases, it is infeasible to approximate the high-dimensional mutual information with algebraic combinations of pairwise mutual information in any forms. In addition, the exhaustive searches of all combinations of features are prerequisite for finding the optimal feature subsets for classifying these kinds of data sets. We show that our approach outperforms existing filter feature subset selection methods for most of the 24 selected benchmark data sets.

【 授权许可】

CC BY   
This is an open access article distributed under the Creative Commons Attribution License (CC BY) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

【 预 览 】
附件列表
Files Size Format View
RO202003190049664ZK.pdf 2671KB PDF download
  文献评价指标  
  下载次数:16次 浏览次数:26次