期刊论文详细信息
BMC Bioinformatics
mAPKL: R/ Bioconductor package for detecting gene exemplars and revealing their characteristics
George Spyrou1  Argiris Sakellariou2 
[1]Center of Systems Biology, Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou Street, Athens 115 27, Greece
[2]Department of Informatics and Telecommunications, National & Kapodistrian University of Athens, Athens, Greece
关键词: Bioconductor;    R;    Gene expression;    Microarray;    Differential expression;    Feature extraction;   
Others  :  1229461
DOI  :  10.1186/s12859-015-0719-5
 received in 2015-05-22, accepted in 2015-08-24,  发布年份 2015
PDF
【 摘 要 】

Background

So far many algorithms have been proposed towards the detection of significant genes in microarray analysis problems. Several of those approaches are freely available as R-packages though their engagement in gene expression analysis by non-bioinformaticians is usually a frustrating task. Besides, only some of those packages offer a complete suite of tools starting from initial data import and ending to analysis report. Here we present an R/Bioconductor package that implements a hybrid gene selection method along with a bunch of functions to facilitate a thorough and convenient gene expression profiling analysis.

Results

mAPKL is an open-source R/Bioconductor package that implements the mAP-KL hybrid gene selection method. The advantage of this method is that selects a small number of gene exemplars while achieving comparable classification results to other well established algorithms on a variety of datasets and dataset sizes. The mAPKL package is accompanied with extra functionalities including (i) solid data import; (ii) data sampling following a user-defined proportion; (iii) preprocessing through several normalization and transformation alternatives; (iv) classification with the aid of SVM and performance evaluation; (v) network analysis of the significant genes (exemplars), including degree of centrality, closeness, betweeness, clustering coefficient as well as the construction of an edge list table; (vi) gene annotation analysis, (vii) pathway analysis and (viii) auto-generated analysis reporting.

Conclusions

Users are able to run a thorough gene expression analysis in a timely manner starting from raw data and concluding to network characteristics of the selected gene exemplars. Detailed instructions and example data are provided in the R package, which is freely available at Bioconductor under the GPL-2 or later license http://www.bioconductor.org/packages/3.1/bioc/html/mAPKL.html.

【 授权许可】

   
2015 Sakellariou and Spyrou.

【 预 览 】
附件列表
Files Size Format View
20151030015001476.pdf 1847KB PDF download
Fig. 3. 151KB Image download
Fig. 2. 30KB Image download
Fig. 1. 61KB Image download
【 图 表 】

Fig. 1.

Fig. 2.

Fig. 3.

【 参考文献 】
  • [1]Sakellariou A, Sanoudou D, Spyrou G. Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data. BMC Bioinformatics. 2012; 13:270. BioMed Central Full Text
  • [2]Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J et al.. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004; 5(10):R80. BioMed Central Full Text
  • [3]Turashvili G, Bouchal J, Baumforth K, Wei W, Dziechciarkova M, Ehrmann J, Klein J, Fridman E, Skarda J, Srovnal J et al.. Novel markers for differentiation of lobular and ductal invasive breast carcinomas by laser microdissection and microarray analysis. BMC Cancer. 2007; 7:55. BioMed Central Full Text
  • [4]Dimitriadou E, Hornik K, Leisch F, Meyer D, Weingessel A. e1071: Misc Functions of the Department of Statistics (e1071). In. TU Wien; 2010. Available from:. http://CRAN. R-project.org/package=e1071 webcite
  • [5]Vastrik I, D’Eustachio P, Schmidt E, Gopinath G, Croft D, de Bono B, Gillespie M, Jassal B, Lewis S, Matthews L et al.. Reactome: a knowledge base of biologic pathways and processes. Genome Biol. 2007; 8(3):R39. BioMed Central Full Text
  • [6]Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal 2006, Complex Systems:1695.
  • [7]Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003; 13(11):2498-504.
  • [8]Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007; 5(1):e8.
  • [9]Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006; 7 Suppl 1:S7. BioMed Central Full Text
  • [10]Storey JD, Leek JT, Bass AJ. edge: Extraction of Differential Gene Expression. In. Bioconductor version 2.0.0; 2015.
  • [11]Storey JD. The optimal discovery procedure: a new approach to simultaneous significance testing. J R Stat Soc Ser B. 2007; 69(3):347-68.
  • [12]Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015; 43(7):e47.
  • [13]Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical applications in genetics and molecular biology. 2004; 3:Article3.
  • [14]Pollard KS, Dudoit S, Laan MJvd. Multiple Testing Procedures: the multtest Package and Applications to Genomics. In: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. 2005: 249–271.
  • [15]van der Laan MJ, Dudoit S, Pollard KS. Multiple testing. Part II. Step-down procedures for control of the family-wise error rate. Statistical applications in genetics and molecular biology. 2004; 3:Article14.
  • [16]Boulesteix A-L, Durif G, Lambert-Lacroix S, Peyre J, Strimmer K. plsgenomics: PLS Analyses for Genomics. In. Cran-R package version 1.3/r13; 2015.
  • [17]Boulesteix AL. PLS dimension reduction for classification with microarray data. Statistical applications in genetics and molecular biology. 2004; 3:Article33.
  • [18]Liaw A, Wiener M. Classification and Regression by randomForest. R News. 2002; 2(3):18-22.
  • [19]Breiman L. Random forests. Mach Learn. 2001; 45(1):5-32.
  • [20]Tibshirani R, Chu G, Narasimhan B, Li J. samr: SAM: Significance Analysis of Microarrays. R package version 2.0. Available from:. http://CRAN. R-project.org/package=samr webcite
  • [21]Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001; 98:5116-21.
  • [22]Opgen-Rhein R, Zuber V, Strimmer K: st: Shrinkage t Statistic and Correlation-Adjusted t-Score. In. Cran-R package version 1.2.4; 2015.
  • [23]Zuber V, Strimmer K. Gene ranking and biomarker discovery under correlation. Bioinformatics. 2009; 25:2700-7.
  • [24]Kuhn. M, Wing CfJ, Weston S, Williams A, Keefer C, Engelhardt A, et al. caret: Classification and Regression Training. In. Cran-R package version 6.0-47; 2015.
  • [25]Strbenac D, Mann GJ, Ormerod JT, Yang JY. ClassifyR: an R package for performance assessment of classification with applications to transcriptomics. Bioinformatics. 2015; 31(11):1851-3.
  • [26]Slawski M, Daumer M, Boulesteix AL. CMA: a comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics. 2008; 9:439. BioMed Central Full Text
  • [27]Johannes M, Ruschhaupt M, Froehlich H, Mansmann U, Buness A, Warnat P, Huber W, Benner A, Beissbarth T: MCRestimate: Misclassification error estimation with cross-validation. In. Bioconductor-R package version 2.24.0; 2010.
  • [28]Carey V, Gentleman R, Mar J, Vertrees J, Gatto L. MLInterfaces: Uniform interfaces to R machine learning procedures for data in Bioconductor containers. In. Bioconductor-R package version 1.48.0.
  文献评价指标  
  下载次数:43次 浏览次数:15次