期刊论文详细信息
BMC Bioinformatics
NBLDA: negative binomial linear discriminant analysis for RNA-Seq data
Research Article
Hongyu Zhao1  Xiang Wan2  Kai Dong3  Tiejun Tong3 
[1] Department of Biostatistics, Yale University, 06510, New Haven, CT, USA;Department of Computer Science and Institute of Computational and Theoretical Studies, Hong Kong Baptist University, Kowloon Tong, Hong Kong;Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong;
关键词: RNA-Seq;    Negative binomial distribution;    Linear discriminant analysis;   
DOI  :  10.1186/s12859-016-1208-1
 received in 2015-11-11, accepted in 2016-08-24,  发布年份 2016
来源: Springer
PDF
【 摘 要 】

BackgroundRNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (Annals Appl Stat 5:2493–2518, 2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as the negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than or equal to the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated.ResultsIn this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes’ rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze two real RNA-Seq data sets to demonstrate the advantages of our method in real-world applications.ConclusionsWe have developed a new classifier using the negative binomial model for RNA-seq data classification. Our simulation results show that our proposed classifier has a better performance than existing works. The proposed classifier can serve as an effective tool for classifying RNA-seq data. Based on the comparison results, we have provided some guidelines for scientists to decide which method should be used in the discriminant analysis of RNA-Seq data. R code is available at http://www.comp.hkbu.edu.hk/~xwan/NBLDA.Ror https://github.com/yangchadam/NBLDA

【 授权许可】

CC BY   
© The Author(s) 2016

【 预 览 】
附件列表
Files Size Format View
RO202311107292166ZK.pdf 1012KB PDF download
Fig. 2 159KB Image download
MediaObjects/40538_2023_474_MOESM8_ESM.xls 17KB Other download
Fig. 1 167KB Image download
MediaObjects/40538_2023_474_MOESM9_ESM.xlsx 13KB Other download
12936_2017_1932_Article_IEq37.gif 1KB Image download
Fig. 1 442KB Image download
Fig. 3 379KB Image download
12936_2017_1963_Article_IEq63.gif 1KB Image download
Fig. 1 400KB Image download
Fig. 1 51KB Image download
【 图 表 】

Fig. 1

Fig. 1

12936_2017_1963_Article_IEq63.gif

Fig. 3

Fig. 1

12936_2017_1932_Article_IEq37.gif

Fig. 1

Fig. 2

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  文献评价指标  
  下载次数:7次 浏览次数:0次