BMC Bioinformatics | |
Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems | |
Research Article | |
Philippe Besse1  Kim-Anh Lê Cao2  Simon Boitard3  | |
[1] Institut de Mathématiques de Toulouse, Université de Toulouse et CNRS (UMR 5219), F-31062, Toulouse, France;Queensland Facility for Advanced Bioinformatics, University of Queensland, 4072, St Lucia, QLD, Australia;UMR444 Laboratoire de Génétique Cellulaire, INRA, BP 52627, F-31326, Castanet Tolosan, France; | |
关键词: Partial Little Square; Variable Selection; Linear Discriminant Analysis; Lasso; Singular Vector; | |
DOI : 10.1186/1471-2105-12-253 | |
received in 2010-11-03, accepted in 2011-06-22, 发布年份 2011 | |
来源: Springer | |
【 摘 要 】
BackgroundVariable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits.ResultsA simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework.ConclusionssPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.
【 授权许可】
CC BY
© Lê Cao et al; licensee BioMed Central Ltd. 2011
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311090743048ZK.pdf | 3656KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
- [41]
- [42]
- [43]
- [44]
- [45]
- [46]
- [47]
- [48]
- [49]
- [50]
- [51]
- [52]
- [53]
- [54]
- [55]
- [56]
- [57]
- [58]