BMC Genomics | |
Improving reliability and absolute quantification of human brain microarray data by filtering and scaling probes using RNA-Seq | |
Mike J Hawrylycz1  Ed S Lein1  John W Phillips1  Elaine H Shen1  Kimberly A Smith1  Chang-Kyu Lee1  Ajamete Kaykas1  Jeff Goldy1  Vilas Menon1  Jeremy A Miller1  | |
[1] Allen Institute for Brain Science, 551 N 34th Street, Seattle, WA 98103, USA | |
关键词: Brain; Gene expression; Reliability; Transcriptome profiling; High-throughput sequencing; RNA-Seq; Microarray; Allen Brain Atlas; | |
Others : 1217842 DOI : 10.1186/1471-2164-15-154 |
|
received in 2013-11-26, accepted in 2014-01-30, 发布年份 2014 | |
【 摘 要 】
Background
High-throughput sequencing is gradually replacing microarrays as the preferred method for studying mRNA expression levels, providing nucleotide resolution and accurately measuring absolute expression levels of almost any transcript, known or novel. However, existing microarray data from clinical, pharmaceutical, and academic settings represent valuable and often underappreciated resources, and methods for assessing and improving the quality of these data are lacking.
Results
To quantitatively assess the quality of microarray probes, we directly compare RNA-Seq to Agilent microarrays by processing 231 unique samples from the Allen Human Brain Atlas using RNA-Seq. Both techniques provide highly consistent, highly reproducible gene expression measurements in adult human brain, with RNA-Seq slightly outperforming microarray results overall. We show that RNA-Seq can be used as ground truth to assess the reliability of most microarray probes, remove probes with off-target effects, and scale probe intensities to match the expression levels identified by RNA-Seq. These sequencing scaled microarray intensities (SSMIs) provide more reliable, quantitative estimates of absolute expression levels for many genes when compared with unscaled intensities. Finally, we validate this result in two human cell lines, showing that linear scaling factors can be applied across experiments using the same microarray platform.
Conclusions
Microarrays provide consistent, reproducible gene expression measurements, which are improved using RNA-Seq as ground truth. We expect that our strategy could be used to improve probe quality for many data sets from major existing repositories.
【 授权许可】
2014 Miller et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150708181726954.pdf | 2987KB | download | |
Figure 6. | 97KB | Image | download |
Figure 5. | 82KB | Image | download |
Figure 4. | 88KB | Image | download |
Figure 3. | 96KB | Image | download |
Figure 2. | 72KB | Image | download |
Figure 1. | 63KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
【 参考文献 】
- [1]Bradford JR, Hey Y, Yates T, Li Y, Pepper SD, Miller CJ: A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling. BMC Genomics 2010, 11:282. BioMed Central Full Text
- [2]Chen H, Liu Z, Gong S, Wu X, Taylor WL, Williams RW, Matta SG, Sharp BM: Genome-wide gene expression profiling of nucleus accumbens neurons projecting to ventral pallidum using both microarray and transcriptome sequencing. Front Neurosci 2011, 5:98.
- [3]Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 2008, 18:1509-1517.
- [4]Raghavachari N, Barb J, Yang Y, Liu P, Woodhouse K, Levy D, O'Donnell CJ, Munson PJ, Kato GJ: A systematic comparison and evaluation of high density exon arrays and RNA-seq technology used to unravel the peripheral blood transcriptome of sickle cell disease. BMC Med Genom 2012, 5:28. BioMed Central Full Text
- [5]Sirbu A, Kerr G, Crane M, Ruskin HJ: RNA-Seq vs dual- and single-channel microarray data: sensitivity analysis for differential expression and clustering. PLoS ONE 2012, 7:e50986.
- [6]Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JG, Geoghegan J, Germino G, Griffin C, Hilmer SC, Hoffman E, Jedlicka AE, Kawasaki E, Martinez-Murillo F, Morsberger L, Lee H, Petersen D, Quackenbush J, Scott A, Wilson M, Yang Y, Ye SQ, Yu W: Multiple-laboratory comparison of microarray platforms. Nat Methods 2005, 2:345-350.
- [7]Jarvinen AK, Hautaniemi S, Edgren H, Auvinen P, Saarela J, Kallioniemi OP, Monni O: Are data from different gene expression microarray platforms comparable? Genomics 2004, 83:1164-1168.
- [8]Parkinson H, Kapushesky M, Kolesnikov N, Rustici G, Shojatalab M, Abeygunawardena N, Berube H, Dylag M, Emam I, Farne A, Holloway E, Lukk M, Malone J, Mani R, Pilicheva E, Rayner TF, Rezwan F, Sharma A, Williams E, Bradley XZ, Adamusiak T, Brandizi M, Burdett T, Coulson R, Krestyaninova M, Kurnosov P, Maguire E, Neogi SG, Rocca-Serra P, Sansone SA: ArrayExpress update–from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res 2009, 37:D868-D872.
- [9]Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002, 30:207-210.
- [10]Mischel PS, Cloughesy TF, Nelson SF: DNA-microarray analysis of brain cancer: molecular classification for therapy. Nat Rev Neurosci 2004, 5:782-792.
- [11]Altar A, Vawter M, Ginsberg S: Target identification for CNS diseases by transcriptional profiling. Neuropsychopharmacol Offic Publ Am College Neuropsychopharmacol 2009, 34:18-54.
- [12]Schadt EE, Friend SH, Shaywitz DA: A network view of disease and compound screening. Nat Rev Drug Discov 2009, 8:286-295.
- [13]Blalock E, Geddes J, Chen K, Porter N, Markesbery W, Landfield P: Incipient Alzheimer's disease: microarray correlation analyses reveal major transcriptional and tumor suppressor responses. Proc Natl Acad Sci U S A 2004, 101:2173-2178.
- [14]Hawrylycz M, Lein E, Guillozet-Bongaarts A, Shen E, Ng L, Miller J, van de Lagemaat L, Smith K, Ebbert A, Riley Z, Abajian C, Beckmann C, Bernard A, Bertagnolli D, Boe A, Cartagena P, Chakravarty M, Chapin M, Chong J, Dalley R, Daly B, Dang C, Datta S, Dee N, Dolbeare T, Faber V, Feng D, Fowler D, Goldy J, Gregor B: An anatomically comprehensive atlas of the adult human brain transcriptome. Nature 2012, 489:391-399.
- [15]Johnson M, Kawasawa Y, Mason C, Krsnik Ž, Coppola G, Bogdanović D, Geschwind D, Mane S, State M, Šestan N: Functional and evolutionary insights into human brain development through global transcriptome analysis. Neuron 2009, 62:494-509.
- [16]Oldham M, Horvath S, Geschwind D: Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc Natl Acad Sci U S A 2006, 103:17973-17978.
- [17]Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286:531-537.
- [18]Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D'Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR: Gene expression correlates of clinical prostate cancer behavior. Canc Cell 2002, 1:203-209.
- [19]Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A 1999, 96:6745-6750.
- [20]Geiss GK, Bumgarner RE, Birditt B, Dahl T, Dowidar N, Dunaway DL, Fell HP, Ferree S, George RD, Grogan T, James JJ, Maysuria M, Mitton JD, Oliveri P, Osborn JL, Peng T, Ratcliffe AL, Webster PJ, Davidson EH, Hood L, Dimitrov K: Direct multiplexed measurement of gene expression with color-coded probe pairs. Nat Biotechnol 2008, 26:317-325.
- [21]Shen EH, Overly CC, Jones AR: The Allen human brain atlas: comprehensive gene expression mapping of the human brain. Trends Neurosci 2012, 35:711-714.
- [22]Garber M, Grabherr MG, Guttman M, Trapnell C: Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods 2011, 8:469-477.
- [23]Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 2009, 25:1105-1111.
- [24]Habegger L, Sboner A, Gianoulis TA, Rozowsky J, Agarwal A, Snyder M, Gerstein M: RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries. Bioinformatics 2011, 27:281-283.
- [25]Li B, Ruotti V, Stewart RM, Thomson JA, Dewey CN: RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 2010, 26:493-500.
- [26]Kadota K, Nishiyama T, Shimizu K: A normalization strategy for comparing tag count data. Algorithms Mol Biol 2012, 7:5. BioMed Central Full Text
- [27]Johnson WE, Li C, Rabinovic A: Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 2007, 8:118-127.
- [28]Kogenaru S, Qing Y, Guo Y, Wang N: RNA-seq and microarray complement each other in transcriptome profiling. BMC Genomics 2012, 13:629. BioMed Central Full Text
- [29]Venet D, Detours V, Bersini H: A measure of the signal-to-noise ratio of microarray samples and studies using gene correlations. PLoS ONE 2012, 7:e51013.
- [30]Bullard J, Purdom E, Hansen K, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinforma 2010, 11:94. BioMed Central Full Text
- [31]Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B: Synthetic spike-in standards for RNA-seq experiments. Genome Res 2011, 21:1543-1551.
- [32]Miller J, Cai C, Langfelder P, Geschwind D, Kurian S, Salomon D, Horvath S: Strategies for aggregating gene expression data: the collapseRows R function. BMC Bioinform 2011, 12:322. BioMed Central Full Text
- [33]Lee CK, Sunkin SM, Kuan C, Thompson CL, Pathak S, Ng L, Lau C, Fischer S, Mortrud M, Slaughterbeck C, Jones A, Lein E, Hawrylycz M: Quantitative methods for genome-scale analysis of in situ hybridization and correlation with microarray data. Genome Biol 2008, 9:R23. BioMed Central Full Text
- [34]Dudley AM, Aach J, Steffen MA, Church GM: Measuring absolute expression with microarrays with a calibrated reference sample and an extended signal intensity range. Proc Natl Acad Sci U S A 2002, 99:7554-7559.
- [35]Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, Watson SJ, Meng F: Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res 2005, 33:e175.
- [36]Wilhelm BT, Landry JR: RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. Methods 2009, 48:249-257.
- [37]Hashimshony T, Wagner F, Sher N, Yanai I: CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. Cell Rep 2012, 2:666-673.
- [38]Tang F, Barbacioru C, Nordman E, Li B, Xu N, Bashkirov VI, Lao K, Surani MA: RNA-Seq analysis to capture the transcriptome landscape of a single cell. Nat Protoc 2010, 5:516-535.
- [39]Grindberg RV, Yee-Greenbaum JL, McConnell MJ, Novotny M, O'Shaughnessy AL, Lambert GM, Arauzo-Bravo MJ, Lee J, Fishman M, Robbins GE, Lin X, Venepally P, Badger JH, Galbraith DW, Gage FH, Lasken RS: RNA-sequencing from single nuclei. Proc Natl Acad Sci U S A 2013, 110:19802-19807.
- [40]Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A, Patapoutian A, Hampton GM, Schultz PG, Hogenesch JB: Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci U S A 2002, 99:4465-4470.
- [41]Su A, Wiltshire T, Batalov S, Lapp H, Ching K, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke M, Walker J, Hogenesch J: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A 2004, 101:6062-6067.
- [42]Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, Raney BJ, Pohl A, Malladi VS, Li CH, Lee BT, Learned K, Kirkup V, Hsu F, Heitner S, Harte RA, Haeussler M, Guruvadoo L, Goldman M, Giardine BM, Fujita PA, Dreszer TR, Diekhans M, Cline MS, Clawson H, Barber GP: The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res 2013, 41:D64-D69.
- [43]Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 2003, 100:9440-9445.