IEEE Access | |
Multivariate Feature Ranking With High-Dimensional Data for Classification Tasks | |
Juan A. Botia1  Jose Palma1  Gracia Sanchez1  Fernando Jimenez1  Luis Miralles-Pechuan2  | |
[1] Department of Information and Communication Engineering, University of Murcia, Murcia, Spain;School of Computer Science, Technological University Dublin, Dublin 7, Ireland; | |
关键词: High-dimensional data; classification; feature ranking; feature selection; machine learning; correlation; | |
DOI : 10.1109/ACCESS.2022.3180773 | |
来源: DOAJ |
【 摘 要 】
In many machine learning classification problems, datasets are usually of high dimensionality and therefore require efficient and effective methods for identifying the relative importance of their attributes, eliminating the redundant and irrelevant ones. Due to the huge size of the search space of the possible solutions, the attribute subset evaluation feature selection methods are not very suitable, so in these scenarios feature ranking methods are used. Most of the feature ranking methods described in the literature are univariate methods, which do not detect interactions between factors. In this paper, we propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency, which have been applied for cancer gene expression and genotype-tissue expression classification tasks using public datasets. We statistically proved that the proposed methods outperform the state-of-the-art feature ranking methods
【 授权许可】
Unknown