期刊论文

【摘要】

BackgroundAlthough ultrahigh-throughput RNA-Sequencing has become the dominant technology for genome-wide transcriptional profiling, the vast majority of RNA-Seq studies typically profile only tens of samples, and most analytical pipelines are optimized for these smaller studies. However, projects are generating ever-larger data sets comprising RNA-Seq data from hundreds or thousands of samples, often collected at multiple centers and from diverse tissues. These complex data sets present significant analytical challenges due to batch and tissue effects, but provide the opportunity to revisit the assumptions and methods that we use to preprocess, normalize, and filter RNA-Seq data – critical first steps for any subsequent analysis.ResultsWe find that analysis of large RNA-Seq data sets requires both careful quality control and the need to account for sparsity due to the heterogeneity intrinsic in multi-group studies. We developed Yet Another RNA Normalization software pipeline (YARN), that includes quality control and preprocessing, gene filtering, and normalization steps designed to facilitate downstream analysis of large, heterogeneous RNA-Seq data sets and we demonstrate its use with data from the Genotype-Tissue Expression (GTEx) project.ConclusionsAn R package instantiating YARN is available at http://bioconductor.org/packages/yarn.

【授权许可】

CC BY
© The Author(s). 2017

【预览】

附件列表
Files	Size	Format	View
RO202311109524074ZK.pdf	2194KB	PDF	download

【参考文献】

[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]

BMC Bioinformatics
Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data
Software
Abhijeet R. Sonawane¹ Camila M. Lopes-Ramos² Maud Fagny² Cho-Yi Chen² John Platig² Marieke L. Kuijjer² Kimberly Glass³ John Quackenbush⁴ Joseph N. Paulson⁵
[1] Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, 02215, Boston, MA, USA;Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 02215, Boston, MA, USA;Department of Biostatistics, Harvard School of Public Health, 02215, Boston, MA, USA;Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 02215, Boston, MA, USA;Department of Biostatistics, Harvard School of Public Health, 02215, Boston, MA, USA;Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, 02215, Boston, MA, USA;Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 02215, Boston, MA, USA;Department of Biostatistics, Harvard School of Public Health, 02215, Boston, MA, USA;Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, 02215, Boston, MA, USA;Department of Cancer Biology, Dana-Farber Cancer Institute, 02215, Boston, MA, USA;Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 02215, Boston, MA, USA;Department of Biostatistics, Harvard School of Public Health, 02215, Boston, MA, USA;Present address: Genentech, Department of Biostatistics, Product Development, 1 DNA Way, 94080, South San Francisco, CA, USA;
关键词: GTEx; RNA-Seq; Quality control; Filtering; Preprocessing; Normalization;
DOI : 10.1186/s12859-017-1847-x
received in 2017-04-19, accepted in 2017-09-21, 发布年份 2017
来源: Springer
PDF


	文献评价指标
	下载次数：14次	浏览次数：1次

【 摘 要 】

【 授权许可】

【 预 览 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【参考文献】