BMC Genomics | |
A flexible Bayesian method for detecting allelic imbalance in RNA-seq data | |
Rita M Graze2  Justin M Fear3  Lauren M McIntyre3  Luis G León-Novelo1  | |
[1] Department of Mathematics, University of Louisiana at Lafayette, 70503 Lafayette, LA, USA;Department of Biological Sciences, Auburn University, 101 Rouse Life Science Building, 36849 Auburn, AL, USA;Department of Molecular Genetics and Microbiology, University of Florida, 32611 Gainesville, FL, USA | |
关键词: Bayesian model; Systematic error; RNA-seq; Allele-specific expression; Allelic imbalance; | |
Others : 1128437 DOI : 10.1186/1471-2164-15-920 |
|
received in 2014-05-30, accepted in 2014-10-09, 发布年份 2014 | |
【 摘 要 】
Background
One method of identifying cis regulatory differences is to analyze allele-specific expression (ASE) and identify cases of allelic imbalance (AI). RNA-seq is the most common way to measure ASE and a binomial test is often applied to determine statistical significance of AI. This implicitly assumes that there is no bias in estimation of AI. However, bias has been found to result from multiple factors including: genome ambiguity, reference quality, the mapping algorithm, and biases in the sequencing process. Two alternative approaches have been developed to handle bias: adjusting for bias using a statistical model and filtering regions of the genome suspected of harboring bias. Existing statistical models which account for bias rely on information from DNA controls, which can be cost prohibitive for large intraspecific studies. In contrast, data filtering is inexpensive and straightforward, but necessarily involves sacrificing a portion of the data.
Results
Here we propose a flexible Bayesian model for analysis of AI, which accounts for bias and can be implemented without DNA controls. In lieu of DNA controls, this Poisson-Gamma (PG) model uses an estimate of bias from simulations. The proposed model always has a lower type I error rate compared to the binomial test. Consistent with prior studies, bias dramatically affects the type I error rate. All of the tested models are sensitive to misspecification of bias. The closer the estimate of bias is to the true underlying bias, the lower the type I error rate. Correct estimates of bias result in a level alpha test.
Conclusions
To improve the assessment of AI, some forms of systematic error (e.g., map bias) can be identified using simulation. The resulting estimates of bias can be used to correct for bias in the PG model, without data filtering. Other sources of bias (e.g., unidentified variant calls) can be easily captured by DNA controls, but are missed by common filtering approaches. Consequently, as variant identification improves, the need for DNA controls will be reduced. Filtering does not significantly improve performance and is not recommended, as information is sacrificed without a measurable gain. The PG model developed here performs well when bias is known, or slightly misspecified. The model is flexible and can accommodate differences in experimental design and bias estimation.
【 授权许可】
2014 León-Novelo et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150223112635344.pdf | 803KB | download | |
Figure 4. | 44KB | Image | download |
Figure 3. | 39KB | Image | download |
Figure 2. | 32KB | Image | download |
Figure 1. | 88KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
【 参考文献 】
- [1]Conne B, Stutz A, Vassalli JD: The 3’ untranslated region of messenger RNA: A molecular ‘hotspot’ for pathology? Nat Med 2000, 6(6):637-641.
- [2]Mendell JT, Dietz HC: When the message goes awry: disease-producing mutations that influence mRNA content and performance. Cell 2001, 107(4):411-414.
- [3]Hollams EM, Giles KM, Thomson AM, Leedman PJ: MRNA stability and the control of gene expression: implications for human disease. Neurochem Res 2002, 27(10):957-980.
- [4]Faustino NA, Cooper TA: Pre-mRNA splicing and human disease. Genes Dev 2003, 17(4):419-437.
- [5]Buckland PR: The importance and identification of regulatory polymorphisms and their mechanisms of action. Biochim Biophys Acta 2006, 1762(1):17-28.
- [6]Chen J-M, Férec C, Cooper DN: A systematic analysis of disease-associated variants in the 3’ regulatory regions of human protein-coding genes I: general principles and overview. Hum Genet 2006, 120(1):1-21.
- [7]Johnson AD, Wang D, Sadee W: Polymorphisms affecting gene regulation and mRNA processing: broad implications for pharmacogenetics. Pharmacol Ther 2005, 106(1):19-38.
- [8]Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, Zhu J, Carlson S, Helgason A, Walters GB, Gunnarsdottir S, Mouy M, Steinthorsdottir V, Eiriksdottir GH, Bjornsdottir G, Reynisdottir I, Gudbjartsson D, Helgadottir A, Jonasdottir A, Jonasdottir A, Styrkarsdottir U, Gretarsdottir S, Magnusson KP, Stefansson H, Fossdal R, Kristjansson K, Gislason HG, Stefansson T, Leifsson BG, Thorsteinsdottir U, Lamb JR, et al.: Genetics of gene expression and its effect on disease. Nature 2008, 452(7186):423-428.
- [9]Lai Z, Gross BL, Zou YI, Andrews J, Rieseberg LH: Microarray analysis reveals differential gene expression in hybrid sunflower species. Mol Ecol 2006, 15(5):1213-1227.
- [10]Jeong S, Rebeiz M, Andolfatto P, Werner T, True J, Carroll SB: The evolution of gene regulation underlies a morphological difference between two Drosophila sister species. Cell 2008, 132(5):783-793.
- [11]Martin-Coello J, Dopazo H, Arbiza L, Roldan ER, Gomendio M, Ausió J: Sexual selection drives weak positive selection in protamine genes and high promoter divergence, enhancing sperm competitiveness. Proc R Soc Biol Sci 2009, 276(1666):2427-2436.
- [12]Wittkopp PJ, Stewart EE, Arnold LL, Neidert AH, Haerum BK, Thompson EM, Akhras S, Smith-Winberry G, Shefner L: Intraspecific polymorphism to interspecific divergence: genetics of pigmentation in Drosophila. Science 2009, 326(5952):540-544.
- [13]Barbash DA, Siino DF, Tarone AM, Roote J: A rapidly evolving MYB-related protein causes species isolation in Drosophila. Proc Nat Acad Sci USA 2003, 100(9):5302-5307.
- [14]Michalak P, Noor MAF: Association of misexpression with sterility in hybrids of Drosophila simulansand D. mauritiana. J Mol Evol 2004, 59(2):277-282.
- [15]Sun S, Ting CT, Wu CI: The normal function of a speciation gene, Odysseus, and its hybrid sterility effect. Science 2004, 305(5680):81-83.
- [16]Haerty W, Singh RS: Gene regulation divergence is a major contributor to the evolution of Dobzhansky-Muller incompatibilities between species of Drosophila. Mol Biol Evol 2006, 23(9):1707-1714.
- [17]Michalak P, Malone JH, Lee IT, Hoshino D, Ma D: Gene expression polymorphism in Drosophila populations. Mol Ecol 2007, 16(6):1179-1189.
- [18]Shirangi TR, Dufour HD, Williams TM, Carroll SB: Rapid evolution of sex pheromone-producing enzyme expression in Drosophila. PLoS Biol 2009, 7(8):e1000168.
- [19]Brem RB, Yvert G, Clinton R, Kruglyak L: Genetic dissection of transcriptional regulation in budding yeast. Science 2002, 296(5568):752-755.
- [20]Yan H, Yuan W, Velculescu VE, Vogelstein B: Kinzler KW: Allelic variation in human gene expression. Science 2002, 297(5584):1143.
- [21]Lo HS, Wang Z, Hu Y, Yang HH, Gere S, Buetow KH, Lee MP: Allelic variation in gene expression is common in the human genome. Genome Res 2003, 13(8):1855-1862.
- [22]Wittkopp PJ, Haerum BK, Clark AG: Evolutionary changes in cis and trans gene regulation. Nature 2004, 430(6995):85-88.
- [23]Kirst M, Basten CJ, Myburg AA, Zeng ZB, Sederoff RR: Genetic architecture of transcript-level variation in differentiating xylem of a eucalyptus hybrid. Genetics 2005, 169(4):2295-2303.
- [24]Ronald J, Akey JM, Whittle J, Smith EN, Yvert G, Kruglyak L: Simultaneous genotyping, gene-expression measurement, and detection of allele-specific expression with oligonucleotide arrays. Genome Res 2005, 15(2):284-291.
- [25]Hughes KA, Ayroles JF, Reedy MM, Drnevich JM, Rowe KC, Ruedi EA, Cáceres CE, Paige KN: Segregating variation in the transcriptome: cis regulation and additivity of effects. Genetics 2006, 173(3):1347-1355.12.
- [26]Genissel A, McIntyre LM, Wayne ML, Nuzhdin SV: Cis and trans regulatory effects contribute to natural variation in transcriptome of Drosophila melanogaster. Mol Biol Evol 2008, 25(1):101-110.
- [27]Guo M, Yang S, Rupe M, Hu B, Bickel DR, Arthur L, Smith O: Genome-wide allele-specific expression analysis using massively parallel signature sequencing (MPSSŹ) reveals cis-and trans-effects on gene expression in maize. Plant Mol Ecol 2008, 66(5):551-563.
- [28]Lemos B, Araripe LO, Fontanillas P, Hartl DL: Dominance and the evolutionary accumulation of cis-and trans-effects on gene expression. Proc Nat Acad Sci 2008, 105(38):14471-14476.
- [29]Graze RM, McIntyre LM, Main BJ, Wayne ML, Nuzhdin SV: Regulatory divergence in Drosophila melanogaster and D. simulans, a genomewide analysis of allele-specific expression. Genetics 2009, 183(2):547-61121.
- [30]Tirosh I, Reikhav S, Levy AA, Barkai N: A yeast hybrid provides insight into the evolution of gene expression regulation. Science 2009, 324(5927):659-662.
- [31]Zhang X, Borevitz JO: Global analysis of allele-specific expression in Arabidopsis thaliana. Genetics 2009, 182(4):943-954.
- [32]McManus CJ, Coolon JD, Duff MO, Eipper-Mains J, Graveley BR, Wittkopp PJ: Regulatory divergence in Drosophila revealed by mRNA-seq. Genome Res 2010, 20(6):816-825.
- [33]Zhang K, Li JB, Gao Y, Egli D, Xie B, Deng J, Li Z, Lee J-H, Aach J, Leproust EM, Eggan K, Church GM: Digital RNA allelotyping reveals tissue-specific and allele-specific gene expression in human. Nat Methods 2009, 6(8):613-618.
- [34]Yang Y, Graze RM, Walts BM, Lopez CM, Baker HV, Wayne ML, Nuzhdin SV, McIntyre LM: Partitioning transcript variation in Drosophila: abundance, isoforms, and alleles. G3 (Bethesda) 2011, 1(6):427-436.
- [35]Degner JF, Marioni JC, Pai AA, Pickrell JK, Nikadori E, Gilad Y, Pritchard JK: Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. J Bioinformatics 2009, 25(24):3207-3212.
- [36]Main BJ, Bickel RD, McIntyre LM, Graze RM, Calabrese PP, Nuzhdin SV: Allele-specific expression assays using Solexa. BMC Genomics 2009, 10(1):422. BioMed Central Full Text
- [37]Emerson JJ, Hsieh LH, Sung HM, Wang TY, Huang CJ, Lu HH-S, Lu M-YJ, Wu S-H, Li W-H: Natural selection on cis and trans regulation in yeasts. Genome Res 2010, 20(6):826-836.
- [38]Fontanillas P, Landry CR, Wittkopp PJ, Russ C, Gruber JD, Nusbaum C, Hartl DL: Key considerations for measuring allelic expression on a genomic scale using high-throughput sequencing. Mol Ecol 2010, 19:212-227.
- [39]Gregg C, Zhang J, Weissbourd B, Luo S, Schroth GP, Haig D, Dulac C: High-resolution analysis of parent-of-origin allelic expression in the mouse brain. Science 2010, 329(5992):643-648.
- [40]Rozowsky J, Abyzov A, Wang J, Alves P, Raha D, Harmanci A, Leng J, Bjorson R, Kong Y, Kitabayashi N, Bhardwaj N, Rubin M, Snyder M, Gerstein M: AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol Syst Biol 2011, 7(1):522.
- [41]Robinson MD, Smyth GK: Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 2008, 9(2):321-332.
- [42]Anders S, Huber W: Differential expression analysis for sequence count data. Genome Biol 2010, 11(10):106. BioMed Central Full Text
- [43]Auer PL, Doerge RW: A two-stage Poisson model for testing RNA-seq data. Stat Appl Genet Mol Biol 2011, 10(1):1-26.
- [44]Langmead B, Hansen KD, Leek JT: Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol 2010, 11(8):R83. BioMed Central Full Text
- [45]Oshlack A, Robinson MD, Young MD: From RNA-seq reads to differential expression results. Genome Biol 2010, 11(12):220. BioMed Central Full Text
- [46]Skelly DA, Johansson M, Madeoy J, Wakefield J, Akey JM: A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. Genome Res 2011, 21(10):1728-1737.
- [47]Turro E, Su SY, Gonçalves Â, Coin LJ, Richardson S, Lewin A: Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads. Genome Res 2011, 12(2):13.
- [48]Graze RM, Novelo LL, Amin V, Fear JM, Casella G, Nuzhdin SV, McIntyre LM: Allelic imbalance in Drosophila hybrid heads: exons, isoforms, and evolution. Mol Biol Evol 2012, 29(6):1521-1532.
- [49]DeVeale B, Kooy DVD, Babak T: Critical evaluation of imprinted gene expression by RNAŰSeq: a new perspective. PLoS Genet 2012, 8(3):e1002600.
- [50]Stevenson KR, Coolon JD, Wittkopp PJ: Sources of bias in measures of allele-specific expression derived from rna-seq data aligned to a single reference genome. BMC Genomics 2013, 14(1):536. BioMed Central Full Text
- [51]Satya RV, Zavaljevski N, Reifman J: A new strategy to reduce allelic bias in RNA-Seq readmapping. Nucleic Acids Res 2012, 40:e127.
- [52]Heap GA, Yang JH, Downes K, Healy BC, Hunt KA, Bockett N, Franke L, Dubois PC, Mein CA, Dobson RJ, Albert TJ, Rodesch MJ, Clayton DG, Todd JA, van Heel DA, Plagnol V: Genome-wide analysis of allelic expression imbalance in human primary cells by high-throughput transcriptome resequencing. Hum Mol Genet 2010, 19(1):122-134.
- [53]Nothnagel M, Wolf A, Herrmann A, Szafranski K, Vater I, Brosch M, Huse K, Siebert R, Platzer M, Hampe J, Krawczak M: Statistical inference of allelic imbalance from transcriptome data. Hum Mutat 2011, 32(1):98-106.
- [54]Pandey RV, Franssen SU, Futschik A, Schlötterer C: Allelic imbalance metre (Allim), a new tool for measuring allele-specific gene expression with RNA-seq data. Mol Ecol Resour 2013, 13(4):740-745.
- [55]Huang W, Massouras A, Inoue Y, Peiffer J, Ràmia M, Tarone AM, Turlapati L, Zichner T, Zhu D, Lyman RF, Magwire MM, Blankenburg K, Carbone MA, Chang K, Ellis LL, Fernandez S, Han Y, Highnam G, Hjelmen CE, Jack JR, Javaid M, Jayaseelan J, Kalra D, Lee S, Lewis L, Munidasa M, Ongeri F, Patel S, Perales L, Perez A, et al.: Natural variation in genome architecture among 205 drosophila melanogaster genetic reference panel lines. Genome Res 2014, 24:1193-1208.
- [56]Li H: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 2013,. 1303.3997arXiv Prepr. arXiv1303.3997
- [57]Garrison E, Marth G: Haplotype-based variant detection from short-read sequencing. 2012,. 1207.3907arXiv Prepr. arXiv1207.3907
- [58]Law CW, Chen Y, Shi W: Smyth GK: Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 2014, 15:29. BioMed Central Full Text
- [59]Montgomery SB, Sammeth M, Gutierrez-Arcelus M, Lach RP, Ingle C, Nisbett J, Guigo R, Dermitzakis ET: Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 2010, 464(7289):773-777.
- [60]Langmead B, Trapnell C, Pop M: Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009, 10(3):25. BioMed Central Full Text
- [61]Frith MC, Wan R, Horton P: Incorporating sequence quality data into alignment improves DNA read mapping. Nucleic Acids Res 2010, 38(7):e100-e100.
- [62]Lee H, Schatz MC: Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score. Bioinformatics 2012, 28(16):2097-2105.
- [63]McIntyre LM, Lopiano KK, Morse AM, Amin V, Oberg AL, Young LJ: Nuzhdin SV: RNA-seq : technical variability and sampling. BMC Genomics 2011, 12(1):293. BioMed Central Full Text
- [64]Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 1995, 57(1):289-300.