Entropy | |
A Feature Subset Selection Method Based On High-Dimensional Mutual Information | |
Yun Zheng1  | |
[1] 1Institute of Developmental Biology and Molecular Medicine, Fudan University, 220 Handan Road, Shanghai 200433, China 2School of Life Sciences, Fudan University, 220 Handan Road, Shanghai 200433, China 3School of Computer Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore | |
关键词: feature selection; mutual information; Entropy; information theory; Markov blanket; classification; | |
DOI : 10.3390/e13040860 | |
来源: mdpi | |
【 摘 要 】
Feature selection is an important step in building accurate classifiers and provides better understanding of the data sets. In this paper, we propose a feature subset selection method based on high-dimensional mutual information. We also propose to use the entropy of the class attribute as a criterion to determine the appropriate subset of features when building classifiers. We prove that if the mutual information between a feature set X and the class attribute Y equals to the entropy of Y , then X is a Markov Blanket of Y . We show that in some cases, it is infeasible to approximate the high-dimensional mutual information with algebraic combinations of pairwise mutual information in any forms. In addition, the exhaustive searches of all combinations of features are prerequisite for finding the optimal feature subsets for classifying these kinds of data sets. We show that our approach outperforms existing filter feature subset selection methods for most of the 24 selected benchmark data sets.
【 授权许可】
CC BY
This is an open access article distributed under the Creative Commons Attribution License (CC BY) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202003190049664ZK.pdf | 2671KB | download |