期刊论文详细信息
BMC Bioinformatics
Feature weight estimation for gene selection: a local hyperlinear learning approach
Hongmin Cai2  Peiying Ruan3  Michael Ng1  Tatsuya Akutsu3 
[1] Department of Mathematics, Hong Kong Baptist University, Hong Kong, China
[2] School of Computer Science and Engineering, South China University of Technology, Guangdong, China
[3] Institute for Chemical Research, Kyoto University, Kyoto, Japan
关键词: KNN;    RELIEF;    Classification;    Local hyperplane;    Feature weighting;   
Others  :  1087599
DOI  :  10.1186/1471-2105-15-70
 received in 2013-10-15, accepted in 2014-03-06,  发布年份 2014
PDF
【 摘 要 】

Background

Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers.

Results

We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB).

Conclusion

Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms.

【 授权许可】

   
2014 Cai et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117022436504.pdf 673KB PDF download
Figure 2. 78KB Image download
Figure 1. 48KB Image download
【 图 表 】

Figure 1.

Figure 2.

【 参考文献 】
  • [1]Duan K-BB, Rajapakse JC, Wang H, Azuaje F: Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobiosci 2005, 4(3):228-234.
  • [2]Guyon I, Weston J, Barnhill S, Vapnik V: Gene selection for cancer classification using support vector machines. Mach Learn 2002, 46:389-422.
  • [3]Ding C, Peng H: Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 2005, 3(2):185-205.
  • [4]Guyon I: An introduction to variable and feature selection. J Mach Learn Res 2003, 3:1157-1182.
  • [5]Huang CJ, Yang DX, Chuang YT: Application of wrapper approach and composite classifier to the stock trend prediction. Expert Syst Appl 2008, 34(4):2870-2878.
  • [6]Koller D, Sahami M: Toward optimal feature selection. In Proceedings of the Thirteenth International Conference on Machine Learning . Edited by Saitta L. Morgan Kaufmann Press; 1996:284-292.
  • [7]Jain AK, Duin RPW, Mao J: Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 2000, 22(1):4-37.
  • [8]Kwak N, Choi C-H: Input feature selection by mutual information based on parzen window. IEEE Trans Pattern Anal Mach Intell 2002, 24:1667-1671.
  • [9]Brown G: Some thoughts at the interface of ensemble methods and feature selection. In Multiple Classifier Systems . Edited by Neamat EG, Josef K, Fabio R. Springer Press; 2010:314-314.
  • [10]Brown G: An information theoretic perspective on multiple classifier systems. In Multiple Classifier Systems Edited by Springer Press, Jón B, Josef K, Fabio R. 2009, 344-353.
  • [11]Sun Y: Iterative relief for feature weighting: Algorithms, theories, and applications. IEEE Trans Pattern Anal Mach Intell 2007, 29(6):1035-1051.
  • [12]Sun Y, Todorovic S, Goodison S: Local-learning-based feature selection for high-dimensional data analysis. IEEE Trans Pattern Anal Mach Intell 2010, 32(9):1610-1626.
  • [13]Kononenko I: Estimating attributes: analysis and extensions of RELIEF. In European Conference on Machine Learning . Edited by Francesco B, Luc D-R. Berlin Heidelberg: Springer Press; 1994:171-182.
  • [14]Li T, Zhang C, Ogihara M: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 2004, 20(15):2429-2437.
  • [15]Wu MC, Zhang L, Wang Z, Christiani DC, Lin X: Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection. Bioinformatics 2009, 25(9):1145-1151.
  • [16]Vincent P, Bengio Y: K-local hyperplane and convex distance nearest neighbor algorithms. In Advances in Neural Information Processing Systems . Edited by Thomas G, Sue B, Zoubin G. MIT Press; 2001:985-992.
  • [17]Sun Y, Wu D: A relief based feature extraction algorithm. In SDM . Edited by Apte C, Park H, Wang K, Zaki J-M. SIAM Press; 2008:188-195.
  • [18]Hall P, Park BU, Samworth RJ: Choice of neighbor order in nearest-neighbor classification. Ann Stat 2008, 36(5):2135-2152.
  • [19]Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D: Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics 2005, 21(20):3896-3904.
  • [20]Geman D, Christian A, Naiman DQ, Winslow RL: Classifying gene expression profiles from pairwise mRNA comparisons. Stat Appl Genet Mol Biol 2004, 3(1):1071-1077.
  • [21]Chopra P, Lee J, Kang J, Lee S: Improving cancer classification accuracy using gene pairs. PloS One 2010, 5(12):e14305.
  • [22]Dagliyan O, Uney Y-F, Kavakli I-H, Turkay M: Optimization based tumor classification from microarray gene expression data. PloS One 2011, 6(2):e14579.
  • [23]Zheng CH, Chong YW, Wang HQ: Gene selection using independent variable group analysis for tumor classification. Neural Comput Appl 2011, 20(2):161-170.
  • [24]Zhang JG, Deng HW: Gene selection for classification of microarray data based on the Bayes error. BMC Bioinformatics 2007, 8(1):370-378. BioMed Central Full Text
  • [25]Zhang H, Wang H, Dai Z, Chen M-s, Yuan Z: Improving accuracy for cancer classification with a new algorithm for genes selection. BMC Bioinformatics 2012, 13(1):1-20. BioMed Central Full Text
  • [26]Roweis ST, Saul LK: Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290(5500):2323-2326.
  • [27]Peng YH: A novel ensemble machine learning for robust microarray data classification. Comput Biol Med 2006, 36:553-573.
  • [28]Girolami M, He C: Probability density estimation from optimally condensed data samples. IEEE Trans Pattern Anal Mach Intell 2003, 25:1253-1264.
  • [29]Christopher A, Andrew M, Stefan S: Locally weighted learning. Artif Intell Rev 1997, 11:11-73.
  • [30]Statnikov A, Wang L, Aliferis CF: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics 2008, 9:319-328. BioMed Central Full Text
  • [31]Shakhnarovich G, Darrell T, Indyk P: Nearest-neighbor methods in learning and vision. IEEE Trans Neural Netw 2008, 19(2):377.
  • [32]Fraley C, Adrian E-R: Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 2002, 97(458):611-631.
  • [33]Pan Y, Ge SS, Al Mamun A: Weighted locally linear embedding for dimension reduction. Pattern Recognit 2009, 42(5):798-811.
  文献评价指标  
  下载次数:23次 浏览次数:13次