| BMC Genomics | |
| Technical and biological variance structure in mRNA-Seq data: life in the real world | |
| Research Article | |
| Terry M Therneau1  Diane E Grill2  Ann L Oberg2  Brian M Bot3  Gregory A Poland4  | |
| [1] Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st St SW, 55905, Rochester, MN, USA;Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st St SW, 55905, Rochester, MN, USA;Mayo Vaccine Research Group, Mayo Clinic, 200 1st St SW, 55905, Rochester, MN, USA;Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st St SW, 55905, Rochester, MN, USA;Statistical Genetics, Sage Bionetworks, 1100 Fairview Ave N, M1-C108, 98109, Seattle, WA, USA;Mayo Vaccine Research Group, Mayo Clinic, 200 1st St SW, 55905, Rochester, MN, USA;Program in Translational Immunovirology and Biodefense, Mayo Clinic, 200 1st St SW, 55905, Rochester, MN, USA;Department of Medicine, Mayo Clinic, 200 1st St SW, 55905, Rochester, MN, USA; | |
| 关键词: Poisson Distribution; Flow Cell; Negative Binomial Distribution; Negative Binomial; Software Upgrade; | |
| DOI : 10.1186/1471-2164-13-304 | |
| received in 2012-01-20, accepted in 2012-07-07, 发布年份 2012 | |
| 来源: Springer | |
PDF
|
|
【 摘 要 】
BackgroundmRNA expression data from next generation sequencing platforms is obtained in the form of counts per gene or exon. Counts have classically been assumed to follow a Poisson distribution in which the variance is equal to the mean. The Negative Binomial distribution which allows for over-dispersion, i.e., for the variance to be greater than the mean, is commonly used to model count data as well.ResultsIn mRNA-Seq data from 25 subjects, we found technical variation to generally follow a Poisson distribution as has been reported previously and biological variability was over-dispersed relative to the Poisson model. The mean-variance relationship across all genes was quadratic, in keeping with a Negative Binomial (NB) distribution. Over-dispersed Poisson and NB distributional assumptions demonstrated marked improvements in goodness-of-fit (GOF) over the standard Poisson model assumptions, but with evidence of over-fitting in some genes. Modeling of experimental effects improved GOF for high variance genes but increased the over-fitting problem.ConclusionsThese conclusions will guide development of analytical strategies for accurate modeling of variance structure in these data and sample size determination which in turn will aid in the identification of true biological signals that inform our understanding of biological systems.
【 授权许可】
CC BY
© Oberg et al.; licensee BioMed Central Ltd. 2012
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202311094115198ZK.pdf | 1563KB |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
PDF