期刊论文详细信息
BMC Bioinformatics
A fast algorithm for determining bounds and accurate approximate p-values of the rank product statistic for replicate experiments
Rainer Breitling1  Rob Eisinga2  Tom Heskes3 
[1]Manchester Institute of Biotechnology, Faculty of Life Sciences, University of Manchester, Manchester, UK
[2]Department of Social Science Research Methods, Radboud University Nijmegen, Nijmegen, The Netherlands
[3]Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands
关键词: Metabolomics;    Proteomics;    Transcriptomics;    p-value distribution;    Rank product statistic;   
Others  :  1084978
DOI  :  10.1186/s12859-014-0367-1
 received in 2014-08-06, accepted in 2014-10-29,  发布年份 2014
PDF
【 摘 要 】

Background

The rank product method is a powerful statistical technique for identifying differentially expressed molecules in replicated experiments. A critical issue in molecule selection is accurate calculation of the p-value of the rank product statistic to adequately address multiple testing. Both exact calculation and permutation and gamma approximations have been proposed to determine molecule-level significance. These current approaches have serious drawbacks as they are either computationally burdensome or provide inaccurate estimates in the tail of the p-value distribution.

Results

We derive strict lower and upper bounds to the exact p-value along with an accurate approximation that can be used to assess the significance of the rank product statistic in a computationally fast manner. The bounds and the proposed approximation are shown to provide far better accuracy over existing approximate methods in determining tail probabilities, with the slightly conservative upper bound protecting against false positives. We illustrate the proposed method in the context of a recently published analysis on transcriptomic profiling performed in blood.

Conclusions

We provide a method to determine upper bounds and accurate approximate p-values of the rank product statistic. The proposed algorithm provides an order of magnitude increase in throughput as compared with current approaches and offers the opportunity to explore new application domains with even larger multiple testing issue. The R code is published in one of the Additional files and is available at http://www.ru.nl/publish/pages/726696/rankprodbounds.zip webcite.

【 授权许可】

   
2014 Heskes et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150113165706930.pdf 792KB PDF download
Figure 3. 130KB Image download
Figure 2. 25KB Image download
Figure 1. 32KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

【 参考文献 】
  • [1]Breitling R, Armengaud P, Amtmann A, Herzyk P: Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 2004, 573(1–3):83-92.
  • [2]Breitling R, Herzyk P: Rank-based methods as a non-parametric alternative of the t-statistic for the analysis of biological microarray data. J Bioinform Comput Biol 2005, 3(5):1171-1189.
  • [3]Jeffery IB, Higgins DG, Culhane AC: Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinform 2006, 7:359. BioMed Central Full Text
  • [4]Chang L-C, Lin H-M, Sibille E, Tseng GC: Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline. BMC Bioinform 2013, 14:368. BioMed Central Full Text
  • [5]Bolstad BM, Irizarry RA, Åstrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003, 19:185-193.
  • [6]Smit S, van Breemen MJ, Hoefsloot HCJ, Smilde AK, Aerts JMFG, de Koster CG: Assessing the statistical validity of proteomics based biomarkers. Anal Chim Acta 2007, 592(2):210-217.
  • [7]Wiederhold E, Gandhi T, Permentier HP, Breitling R, Poolman B, Slotboom DJ: The yeast vacuolar membrane proteome. Mol Cell Proteomics 2009, 8(2):380-392.
  • [8]Fukushima A, Kusano M, Redestig H, Arita M, Saito K: Metabolomic correlation-network modules in Arabidopsis based on a graph-clustering approach. BMC Syst Biol 2011, 5:1. BioMed Central Full Text
  • [9]Storey JD, Tibshirani R: Statistical significance for genome-wide experiments. Proc Natl Acad Sci USA 2003, 100(16):9440-9445.
  • [10]Pounds S, Cheng C: Robust estimation of the false discovery rate. Bioinformatics 2006, 22(16):1979-1987.
  • [11]Eisinga R, Breitling R, Heskes T: The exact probability distribution of the rank product statistics for replicated experiments. FEBS Lett 2013, 587(6):677-682.
  • [12]Knijnenburg TA: Fewer permutations, more accurate P-values. Bioinformatics 2009, 25(12):161-168.
  • [13]Koziol JA: Comments on the rank product method for analyzing replicated experiments. FEBS Lett 2010, 584(5):941-944.
  • [14]R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria; 2012.
  • [15]Caldas J, Vinga S: Global meta-analysis of transcriptomics studies. PLoS One 2014, 9(2):e89318.
  • [16]Dembélé D, Kastner P: Fold change rank ordering statistics: a new method for detecting differentially expressed genes. BMC Bioinform 2014, 15:14. BioMed Central Full Text
  • [17]Pounds S, Cheng C: Statistical development and evaluation of microarray gene expression data filters. J Comput Biol 2005, 12(4):482-495.
  • [18]Fisher RA: Statistical Methods for Research Workers. Oliver and Boyd, London; 1932.
  • [19]Van den Akker EB, Passtoors WM, Jansen R, van Zwet EW, Goeman JJ, Hulsman M, Emilsson V, Perola M, Willemsen G, Penninx BW, Heijmans BT, Maier AB, Boomsma DI, Kok JN, Slagboom PE, Reinders MJ, Beekman M: Meta-analysis on blood transcriptomic studies identifies consistently coexpressed protein-protein interaction modules as robust markers of human aging. Aging Cell 2014, 13(2):216-225.
  • [20]Storey JD: A direct approach to false discovery rates. J Roy Stat Soc B 2002, 64(3):479-498.
  • [21]Cinghu S, Yellaboina S, Freudenberg JM, Ghosh S, Zheng X, Oldfield AJ, Lackford BL, Zaykin DV, Hu G, Jothi R: Integrative framework for identification of key cell identity genes uncovers determinants of ES cell identity and homeostasis. PNAS 2014, 111(16):E1581-E1590.
  • [22]Tsoi LC, Qin T, Slate EH, Zheng WJ: Consistent Differential Expression Pattern (CDEP) on microarray to identify genes related to metastatic behavior. BMC Bioinform 2011, 12:438. BioMed Central Full Text
  • [23]Louenço A, Conover M, Wong A, Nematzadeh A, Pan F, Shatkay H, Rocha LM: A linear classifier based on entity recognition tools and a statistical approach to method extraction in the protein-protein interaction literature. BMC Bioinform 2011, 12(Suppl 8):S12. BioMed Central Full Text
  文献评价指标  
  下载次数:68次 浏览次数:126次