BMC Bioinformatics | |
mAPKL: R/ Bioconductor package for detecting gene exemplars and revealing their characteristics | |
George Spyrou1  Argiris Sakellariou2  | |
[1]Center of Systems Biology, Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou Street, Athens 115 27, Greece | |
[2]Department of Informatics and Telecommunications, National & Kapodistrian University of Athens, Athens, Greece | |
关键词: Bioconductor; R; Gene expression; Microarray; Differential expression; Feature extraction; | |
Others : 1229461 DOI : 10.1186/s12859-015-0719-5 |
|
received in 2015-05-22, accepted in 2015-08-24, 发布年份 2015 | |
【 摘 要 】
Background
So far many algorithms have been proposed towards the detection of significant genes in microarray analysis problems. Several of those approaches are freely available as R-packages though their engagement in gene expression analysis by non-bioinformaticians is usually a frustrating task. Besides, only some of those packages offer a complete suite of tools starting from initial data import and ending to analysis report. Here we present an R/Bioconductor package that implements a hybrid gene selection method along with a bunch of functions to facilitate a thorough and convenient gene expression profiling analysis.
Results
mAPKL is an open-source R/Bioconductor package that implements the mAP-KL hybrid gene selection method. The advantage of this method is that selects a small number of gene exemplars while achieving comparable classification results to other well established algorithms on a variety of datasets and dataset sizes. The mAPKL package is accompanied with extra functionalities including (i) solid data import; (ii) data sampling following a user-defined proportion; (iii) preprocessing through several normalization and transformation alternatives; (iv) classification with the aid of SVM and performance evaluation; (v) network analysis of the significant genes (exemplars), including degree of centrality, closeness, betweeness, clustering coefficient as well as the construction of an edge list table; (vi) gene annotation analysis, (vii) pathway analysis and (viii) auto-generated analysis reporting.
Conclusions
Users are able to run a thorough gene expression analysis in a timely manner starting from raw data and concluding to network characteristics of the selected gene exemplars. Detailed instructions and example data are provided in the R package, which is freely available at Bioconductor under the GPL-2 or later license http://www.bioconductor.org/packages/3.1/bioc/html/mAPKL.html.
【 授权许可】
2015 Sakellariou and Spyrou.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20151030015001476.pdf | 1847KB | download | |
Fig. 3. | 151KB | Image | download |
Fig. 2. | 30KB | Image | download |
Fig. 1. | 61KB | Image | download |
【 图 表 】
Fig. 1.
Fig. 2.
Fig. 3.
【 参考文献 】
- [1]Sakellariou A, Sanoudou D, Spyrou G. Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data. BMC Bioinformatics. 2012; 13:270. BioMed Central Full Text
- [2]Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J et al.. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004; 5(10):R80. BioMed Central Full Text
- [3]Turashvili G, Bouchal J, Baumforth K, Wei W, Dziechciarkova M, Ehrmann J, Klein J, Fridman E, Skarda J, Srovnal J et al.. Novel markers for differentiation of lobular and ductal invasive breast carcinomas by laser microdissection and microarray analysis. BMC Cancer. 2007; 7:55. BioMed Central Full Text
- [4]Dimitriadou E, Hornik K, Leisch F, Meyer D, Weingessel A. e1071: Misc Functions of the Department of Statistics (e1071). In. TU Wien; 2010. Available from:. http://CRAN. R-project.org/package=e1071 webcite
- [5]Vastrik I, D’Eustachio P, Schmidt E, Gopinath G, Croft D, de Bono B, Gillespie M, Jassal B, Lewis S, Matthews L et al.. Reactome: a knowledge base of biologic pathways and processes. Genome Biol. 2007; 8(3):R39. BioMed Central Full Text
- [6]Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal 2006, Complex Systems:1695.
- [7]Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003; 13(11):2498-504.
- [8]Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007; 5(1):e8.
- [9]Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006; 7 Suppl 1:S7. BioMed Central Full Text
- [10]Storey JD, Leek JT, Bass AJ. edge: Extraction of Differential Gene Expression. In. Bioconductor version 2.0.0; 2015.
- [11]Storey JD. The optimal discovery procedure: a new approach to simultaneous significance testing. J R Stat Soc Ser B. 2007; 69(3):347-68.
- [12]Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015; 43(7):e47.
- [13]Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical applications in genetics and molecular biology. 2004; 3:Article3.
- [14]Pollard KS, Dudoit S, Laan MJvd. Multiple Testing Procedures: the multtest Package and Applications to Genomics. In: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. 2005: 249–271.
- [15]van der Laan MJ, Dudoit S, Pollard KS. Multiple testing. Part II. Step-down procedures for control of the family-wise error rate. Statistical applications in genetics and molecular biology. 2004; 3:Article14.
- [16]Boulesteix A-L, Durif G, Lambert-Lacroix S, Peyre J, Strimmer K. plsgenomics: PLS Analyses for Genomics. In. Cran-R package version 1.3/r13; 2015.
- [17]Boulesteix AL. PLS dimension reduction for classification with microarray data. Statistical applications in genetics and molecular biology. 2004; 3:Article33.
- [18]Liaw A, Wiener M. Classification and Regression by randomForest. R News. 2002; 2(3):18-22.
- [19]Breiman L. Random forests. Mach Learn. 2001; 45(1):5-32.
- [20]Tibshirani R, Chu G, Narasimhan B, Li J. samr: SAM: Significance Analysis of Microarrays. R package version 2.0. Available from:. http://CRAN. R-project.org/package=samr webcite
- [21]Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001; 98:5116-21.
- [22]Opgen-Rhein R, Zuber V, Strimmer K: st: Shrinkage t Statistic and Correlation-Adjusted t-Score. In. Cran-R package version 1.2.4; 2015.
- [23]Zuber V, Strimmer K. Gene ranking and biomarker discovery under correlation. Bioinformatics. 2009; 25:2700-7.
- [24]Kuhn. M, Wing CfJ, Weston S, Williams A, Keefer C, Engelhardt A, et al. caret: Classification and Regression Training. In. Cran-R package version 6.0-47; 2015.
- [25]Strbenac D, Mann GJ, Ormerod JT, Yang JY. ClassifyR: an R package for performance assessment of classification with applications to transcriptomics. Bioinformatics. 2015; 31(11):1851-3.
- [26]Slawski M, Daumer M, Boulesteix AL. CMA: a comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics. 2008; 9:439. BioMed Central Full Text
- [27]Johannes M, Ruschhaupt M, Froehlich H, Mansmann U, Buness A, Warnat P, Huber W, Benner A, Beissbarth T: MCRestimate: Misclassification error estimation with cross-validation. In. Bioconductor-R package version 2.24.0; 2010.
- [28]Carey V, Gentleman R, Mar J, Vertrees J, Gatto L. MLInterfaces: Uniform interfaces to R machine learning procedures for data in Bioconductor containers. In. Bioconductor-R package version 1.48.0.