期刊论文详细信息
BMC Bioinformatics
Phylogenetic tree construction using trinucleotide usage profile (TUP)
Proceedings
Tit-Yee Wong1  Lih-Yuan Deng2  Behrouz Madahian2  Dale Bowman2  Henry Horng-Shing Lu3  Jyh-Jen Horng Shiau3  Si Chen4 
[1] Department of Biological Sciences, University of Memphis, Memphis, TN, USA;Department of Mathematical Sciences, University of Memphis, Memphis, TN, USA;Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan;Key Laboratory of Combinatorial Biosynthesis and Drug Discovery Ministry of Education and School of Pharmaceutical Sciences Wuhan University, Wuhan, China;
关键词: Feature frequency profile (FFP);    Reading frame;    Summary statistics;    Phylogenetic tree construction;    Tree comparison;   
DOI  :  10.1186/s12859-016-1222-3
来源: Springer
PDF
【 摘 要 】

BackgroundIt has been a challenging task to build a genome-wide phylogenetic tree for a large group of species containing a large number of genes with long nucleotides sequences. The most popular method, called feature frequency profile (FFP-k), finds the frequency distribution for all words of certain length k over the whole genome sequence using (overlapping) windows of the same length. For a satisfactory result, the recommended word length (k) ranges from 6 to 15 and it may not be a multiple of 3 (codon length). The total number of possible words needed for FFP-k can range from 46=4096 to 415.ResultsWe propose a simple improvement over the popular FFP method using only a typical word length of 3. A new method, called Trinucleotide Usage Profile (TUP), is proposed based only on the (relative) frequency distribution using non-overlapping windows of length 3. The total number of possible words needed for TUP is 43=64, which is much less than the total count for the recommended optimal “resolution” for FFP. To build a phylogenetic tree, we propose first representing each of the species by a TUP vector and then using an appropriate distance measure between pairs of the TUP vectors for the tree construction. In particular, we propose summarizing a DNA sequence by a matrix of three rows corresponding to three reading frames, recording the frequency distribution of the non-overlapping words of length 3 in each of the reading frame. We also provide a numerical measure for comparing trees constructed with various methods.ConclusionsCompared to the FFP method, our empirical study showed that the proposed TUP method is more capable of building phylogenetic trees with a stronger biological support. We further provide some justifications on this from the information theory viewpoint. Unlike the FFP method, the TUP method takes the advantage that the starting of the first reading frame is (usually) known. Without this information, the FFP method could only rely on the frequency distribution of overlapping words, which is the average (or mixture) of the frequency distributions of three possible reading frames. Consequently, we show (from the entropy viewpoint) that the FFP procedure could dilute important gene information and therefore provides less accurate classification.

【 授权许可】

CC BY   
© The Author(s) 2016

【 预 览 】
附件列表
Files Size Format View
RO202311099341141ZK.pdf 2986KB PDF download
12864_2017_4132_Article_IEq36.gif 1KB Image download
12864_2016_3098_Article_IEq78.gif 1KB Image download
12864_2017_3781_Article_IEq7.gif 1KB Image download
12864_2017_3521_Article_IEq4.gif 1KB Image download
12864_2017_3781_Article_IEq9.gif 1KB Image download
12864_2017_3733_Article_IEq55.gif 1KB Image download
12864_2017_3733_Article_IEq56.gif 1KB Image download
12864_2016_3098_Article_IEq87.gif 1KB Image download
12864_2017_3733_Article_IEq57.gif 1KB Image download
12864_2016_2463_Article_IEq1.gif 1KB Image download
12864_2016_3098_Article_IEq29.gif 1KB Image download
12864_2017_3733_Article_IEq61.gif 1KB Image download
12864_2016_2789_Article_IEq42.gif 1KB Image download
12864_2015_2001_Article_IEq2.gif 1KB Image download
12864_2016_2789_Article_IEq44.gif 1KB Image download
12864_2017_3655_Article_IEq8.gif 1KB Image download
12902_2017_161_Article_IEq1.gif 1KB Image download
12902_2017_161_Article_IEq3.gif 1KB Image download
12864_2017_4269_Article_IEq7.gif 1KB Image download
12864_2016_2821_Article_IEq12.gif 1KB Image download
12864_2017_4269_Article_IEq9.gif 1KB Image download
12864_2017_4358_Article_IEq1.gif 1KB Image download
12864_2017_3655_Article_IEq14.gif 1KB Image download
12864_2017_3655_Article_IEq16.gif 1KB Image download
【 图 表 】

12864_2017_3655_Article_IEq16.gif

12864_2017_3655_Article_IEq14.gif

12864_2017_4358_Article_IEq1.gif

12864_2017_4269_Article_IEq9.gif

12864_2016_2821_Article_IEq12.gif

12864_2017_4269_Article_IEq7.gif

12902_2017_161_Article_IEq3.gif

12902_2017_161_Article_IEq1.gif

12864_2017_3655_Article_IEq8.gif

12864_2016_2789_Article_IEq44.gif

12864_2015_2001_Article_IEq2.gif

12864_2016_2789_Article_IEq42.gif

12864_2017_3733_Article_IEq61.gif

12864_2016_3098_Article_IEq29.gif

12864_2016_2463_Article_IEq1.gif

12864_2017_3733_Article_IEq57.gif

12864_2016_3098_Article_IEq87.gif

12864_2017_3733_Article_IEq56.gif

12864_2017_3733_Article_IEq55.gif

12864_2017_3781_Article_IEq9.gif

12864_2017_3521_Article_IEq4.gif

12864_2017_3781_Article_IEq7.gif

12864_2016_3098_Article_IEq78.gif

12864_2017_4132_Article_IEq36.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  文献评价指标  
  下载次数:1次 浏览次数:0次