期刊论文详细信息
BMC Bioinformatics
Model-based clustering with certainty estimation: implication for clade assignment of influenza viruses
Methodology Article
Zhong Li1  Guoqing Lu2  Kevin Beland2  Shunpu Zhang3 
[1] College of Science, Zhejiang Sci-Tech University, 310018, Hangzhou, China;Department of Biology, University of Nebraska at Omaha, 68182, Omaha, NE, USA;Department of Statistics, University of Central Florida, 32816, Orlando, FL, USA;
关键词: Model-based clustering;    Multidimensional scaling;    Bootstrap;    Certainty;    Influenza A hemagglutinin (HA);   
DOI  :  10.1186/s12859-016-1147-x
 received in 2015-10-30, accepted in 2016-07-13,  发布年份 2016
来源: Springer
PDF
【 摘 要 】

BackgroundClustering is a common technique used by molecular biologists to group homologous sequences and study evolution. There remain issues such as how to cluster molecular sequences accurately and in particular how to evaluate the certainty of clustering results.ResultsWe presented a model-based clustering method to analyze molecular sequences, described a subset bootstrap scheme to evaluate a certainty of the clusters, and showed an intuitive way using 3D visualization to examine clusters. We applied the above approach to analyze influenza viral hemagglutinin (HA) sequences. Nine clusters were estimated for high pathogenic H5N1 avian influenza, which agree with previous findings. The certainty for a given sequence that can be correctly assigned to a cluster was all 1.0 whereas the certainty for a given cluster was also very high (0.92–1.0), with an overall clustering certainty of 0.95. For influenza A H7 viruses, ten HA clusters were estimated and the vast majority of sequences could be assigned to a cluster with a certainty of more than 0.99. The certainties for clusters, however, varied from 0.40 to 0.98; such certainty variation is likely attributed to the heterogeneity of sequence data in different clusters. In both cases, the certainty values estimated using the subset bootstrap method are all higher than those calculated based upon the standard bootstrap method, suggesting our bootstrap scheme is applicable for the estimation of clustering certainty.ConclusionsWe formulated a clustering analysis approach with the estimation of certainties and 3D visualization of sequence data. We analysed 2 sets of influenza A HA sequences and the results indicate our approach was applicable for clustering analysis of influenza viral sequences.

【 授权许可】

CC BY   
© The Author(s). 2016

【 预 览 】
附件列表
Files Size Format View
RO202311097979044ZK.pdf 915KB PDF download
12864_2015_2273_Article_IEq16.gif 1KB Image download
12864_2017_3487_Article_IEq29.gif 1KB Image download
12864_2015_2304_Article_IEq6.gif 1KB Image download
12864_2017_4071_Article_IEq1.gif 1KB Image download
12864_2017_3487_Article_IEq32.gif 1KB Image download
12864_2017_3670_Article_IEq12.gif 1KB Image download
12864_2017_4132_Article_IEq24.gif 1KB Image download
【 图 表 】

12864_2017_4132_Article_IEq24.gif

12864_2017_3670_Article_IEq12.gif

12864_2017_3487_Article_IEq32.gif

12864_2017_4071_Article_IEq1.gif

12864_2015_2304_Article_IEq6.gif

12864_2017_3487_Article_IEq29.gif

12864_2015_2273_Article_IEq16.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  文献评价指标  
  下载次数:7次 浏览次数:0次