学位论文详细信息
Variable Selection in Multiclass SupportVector Machine and Applications in Genomic Data Analysis
multi-class classification;support vector machine;microarray;variable selection
Huang, Lingkang ; Dr. Zhao-Bang Zeng, Committee Chair,Dr. Hao Helen Zhang, Committee Co-Chair,Huang, Lingkang ; Dr. Zhao-Bang Zeng ; Committee Chair ; Dr. Hao Helen Zhang ; Committee Co-Chair
University:North Carolina State University
关键词: multi-class classification;    support vector machine;    microarray;    variable selection;   
Others  :  https://repository.lib.ncsu.edu/bitstream/handle/1840.16/5240/etd.pdf?sequence=1&isAllowed=y
美国|英语
来源: null
PDF
【 摘 要 】

Microarray techniques provide new insights into cancer diagnosis using gene expression profiles. Molecular diagnosis based on high-throughput genomic data sets presents major challenge due tothe overwhelming number of variables and complex multi-class nature of tumor samples. In this thesis, the author first tackled a multi-class problem related to liver toxicity severityprediction using the Random Forest and GEMS-SVM (Gene Expression Model Selector using Support Vector Machine). However, the original SVM regularization formulation does not accommodate the variable selection. Most existing approaches, including GEMS-SVM, handle this issue by selecting genes prior to classification,which does not consider the correlation among genes since they are selected by univariate ranking. In this thesis, the authordeveloped new multi-class SVM (MSVM) approaches which can perform multi-class classification and variable selection simultaneouslyand learn optimal classifiers by considering all classes and all genes at the same time. The original multi-class SVM proposed byCrammer and Singer (2001) does not perform the variable selection. By using the MSVM loss function proposed by Crammer and Singer(2001), the author developed new variable selection approaches for both linear and non-linear classification problems. For linearclassification problems, four different sparse regularization terms were included in the objective function respectively. Fornonlinear classification problems, two different approaches have been developed to tackle them. The first approach was used innon-linear MSVMs via basis function transformation. The secondapproach was used in non-linear MSVMs via kernel functions. The newly developed methods were applied to both simulation and realdata sets. The results demonstrated that our methods could select a much smaller number of genes, compared with other existingmethods, with high classification accuracy to predict the tumor subtypes. The combination of high accuracy and small number ofgenes makes our new methods as powerful tools for disease diagnostics based on expression data and target identifications ofthe therapeutic intervention.

【 预 览 】
附件列表
Files Size Format View
Variable Selection in Multiclass SupportVector Machine and Applications in Genomic Data Analysis 1288KB PDF download
  文献评价指标  
  下载次数:34次 浏览次数:20次