Journal of Biometrics & Biostatistics | |
Inference and Sample Size Calculations Based on Statistical Tests in aNegative Binomial Distribution for Differential Gene Expression in RNAseqData | |
article | |
Xiaohong Li1  Nigel GF Cooper2  Yu Shyr3  Dongfeng Wu1  Eric C Rouchka4  Ryan S Gill5  Timothy E O’Toole6  Guy N Brock1  Shesh N Rai1  | |
[1] Department of Bioinformatics and Biostatistics, University of Louisville;Department of Anatomical Sciences and Neurobiology, University of Louisville;Department of Biostatistics, Vanderbilt University;Department of Computer Engineering and Computer Science, University of Louisville;Department of Mathematics, University of Louisville;Department of Cardiology, University of Louisville;Department of Biomedical Informatics, Ohio State University | |
关键词: FDR; Sample size; Wald test; Exact test; Likelihood ratio test; | |
DOI : 10.4172/2155-6180.1000332 | |
来源: Hilaris Publisher | |
【 摘 要 】
The high throughput RNA sequencing (RNA-seq) technology has become the popular method of choice for transcriptomics and the detection of differentially expressed genes. Sample size calculations for RNA-seq experimental design are an important consideration in biological research and clinical trials. Currently, the sample size formulas derived from the Wald and the likelihood ratio statistical tests with a Poisson distribution to model RNA-seq data have been developed. However, since the mean read counts in the real RNA-seq data are not equal to the variance, an extended method to calculate sample sizes based on a negative binomial distribution using an exact test statistic was proposed by Li et al. in 2013. In this study, we alternatively derive five sample size calculation methods based on the negative binomial distribution using the Wald test, the log-transformed Wald test and the log-likelihood ratio test statistics. A comparison of our five methods and an existing method was performed by calculating the sample sizes and the simulated power in different scenarios. We first calculated the sample sizes for testing a single gene using the six methods given a nominal significance level α at 0.05 and 80% power. Then, we calculated the sample sizes for testing multiple genes given a false discovery rate (FDR) at 0.05 and 0.10. The empirical power and true prognostic genes for differential gene expression analysis corresponding to the estimated sample sizes from the six methods are also estimated via the simulation studies. Using the sample size formulas derived from log-transformed and Wald-based tests, we observed smaller sample properties while maintaining the nominal power close to or higher than 80% in all the settings compared to other methods. Moreover, the Wald test based sample size calculation method is easier to compute and faster in an RNA-seq experimental design.
【 授权许可】
Unknown
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202307140003898ZK.pdf | 1011KB | download |