| BMC Genomics | |
| Gene selection and classification for cancer microarray data based on machine learning and similarity measures | |
| Research Article | |
| Zhongxue Chen1  Xudong Huang2  Andrew H Sung3  Lei Chen4  Qingzhong Liu4  Mengyu Qiao5  Jianzhong Liu6  Zhaohui Wang7  Youping Deng8  | |
| [1] Biostatistics Epidemiology Research Design Core, Center for Clinical and Translational Sciences, The University of Texas Health Science Center at Houston, 77030, Houston, TX, USA;Conjugate and Medicinal Chemistry Laboratory, Division of Nuclear Medicine and Molecular Imaging, Department of Radiology, Brigham and Women's Hospital and Harvard Medical School, 02115, Boston, MA, USA;Department of Computer Science and Institute of Complex Additive Systems Analysis, New Mexico Institute of Mining and Technology, 87801, Socorro, NM, USA;Department of Computer Science, Sam Houston State University, 77341, Huntsville, TX, USA;Mathematics and Computer Science, Dept. of Mathematics & Computer Science, South Dakota School of Mines & Technology, 57701-3995, Rapid City, SD, USA;The Chem21 Group, Inc, 1780 Wilson Drive, 60045, Lake Forest, IL, USA;Wuhan University of Science and Technology, 430081, Wuhan, Hubei, China;Wuhan University of Science and Technology, 430081, Wuhan, Hubei, China;Cancer Bioinformatics, Rush University Cancer Center, and Department of Internal Medicine, Rush University Medical Center, 60612, Chicago, IL, USA; | |
| 关键词: gene selection; microarray; classification; supervised-learning; similarity; | |
| DOI : 10.1186/1471-2164-12-S5-S1 | |
| 来源: Springer | |
PDF
|
|
【 摘 要 】
BackgroundMicroarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money.ResultsTo deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others.ConclusionsOn average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF.
【 授权许可】
Unknown
© Liu et al. licensee BioMed Central Ltd 2011. This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202311104540202ZK.pdf | 3346KB |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
PDF