期刊论文详细信息
Frontiers in Genetics
Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data
Stephen S. Ferguson1  Scott S. Auerbach1  Richard S. Paules1  Pierre R. Bushel1  Sreenivasa C. Ramaiahgari1 
[1] Biomolecular Screening Branch, National Institute of Environmental Health Sciences of National Institutes of Health, Durham, NC, United States;Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences of National Institutes of Health, Durham, NC, United States;Massive Genome Informatics Group, National Institute of Environmental Health Sciences of National Institutes of Health, Durham, NC, United States;
关键词: TempO-Seq;    normalization;    gene expression;    mRNA;    transcription;   
DOI  :  10.3389/fgene.2020.00594
来源: DOAJ
【 摘 要 】

Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次