期刊论文详细信息
BMC Bioinformatics
A classification approach for genotyping viral sequences based on multidimensional scaling and linear discriminant analysis
Methodology Article
Kichan Lee1  Sung Hee Park1  Sangsoo Kim1  Jiwoong Kim2  Yongju Ahn3 
[1] Department of Bioinformatics & Life Sciences, Soongsil University, 156-743, Seoul, Korea;Department of Bioinformatics & Life Sciences, Soongsil University, 156-743, Seoul, Korea;Equispharm Co., Ltd, 443-766, Suwon, Korea;Department of Bioinformatics & Life Sciences, Soongsil University, 156-743, Seoul, Korea;Macrogen Inc., 153-023, Seoul, Korea;
关键词: Reference Sequence;    Linear Discriminant Analysis;    Gene Segment;    Recombinant Form;    Quadratic Discriminant Analysis;   
DOI  :  10.1186/1471-2105-11-434
 received in 2009-12-15, accepted in 2010-08-21,  发布年份 2010
来源: Springer
PDF
【 摘 要 】

BackgroundAccurate classification into genotypes is critical in understanding evolution of divergent viruses. Here we report a new approach, MuLDAS, which classifies a query sequence based on the statistical genotype models learned from the known sequences. Thus, MuLDAS utilizes full spectra of well characterized sequences as references, typically of an order of hundreds, in order to estimate the significance of each genotype assignment.ResultsMuLDAS starts by aligning the query sequence to the reference multiple sequence alignment and calculating the subsequent distance matrix among the sequences. They are then mapped to a principal coordinate space by multidimensional scaling, and the coordinates of the reference sequences are used as features in developing linear discriminant models that partition the space by genotype. The genotype of the query is then given as the maximum a posteriori estimate. MuLDAS tests the model confidence by leave-one-out cross-validation and also provides some heuristics for the detection of 'outlier' sequences that fall far outside or in-between genotype clusters. We have tested our method by classifying HIV-1 and HCV nucleotide sequences downloaded from NCBI GenBank, achieving the overall concordance rates of 99.3% and 96.6%, respectively, with the benchmark test dataset retrieved from the respective databases of Los Alamos National Laboratory.ConclusionsThe highly accurate genotype assignment coupled with several measures for evaluating the results makes MuLDAS useful in analyzing the sequences of rapidly evolving viruses such as HIV-1 and HCV. A web-based genotype prediction server is available at http://www.muldas.org/MuLDAS/.

【 授权许可】

CC BY   
© Kim et al; licensee BioMed Central Ltd. 2010

【 预 览 】
附件列表
Files Size Format View
RO202311109430295ZK.pdf 2379KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  文献评价指标  
  下载次数:1次 浏览次数:0次