学位论文详细信息
Improving gene trees without more data
Gene trees;Species trees;Binning;Multi-locus bootstrapping (MLBS);BestML;Gene tree estimation;Species tree estimation;Low phylogenetic signal
Gupta, Ashu ; Warnow ; Tandy
关键词: Gene trees;    Species trees;    Binning;    Multi-locus bootstrapping (MLBS);    BestML;    Gene tree estimation;    Species tree estimation;    Low phylogenetic signal;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/90687/GUPTA-THESIS-2016.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Species tree and gene tree estimation from sequence data are two steps in many biological analyses. Computational challenges and limited amount of data often make estimating highly accurate phylogenetic trees a difficult task. Moreover, gene alignments used to estimate trees on individual loci often have low phylogenetic signal (e.g., short alignment length), resulting in poorly estimated gene trees. Species tree estimation on the other hand is challenged by individual loci having different evolutionary histories caused by a biological phenomenon known as incomplete lineage sorting (ILS). In the presence of ILS, summary methods like MP-EST, ASTRAL2, and ASTRID are often used to estimate the species tree from gene trees. Summary methods operate by combining estimated gene trees and thus suffer in the presence of low phylogenetic signal. To tackle this problem the Statistical Binning and Weighted Statistical Binning pipelines were designed to improve gene tree estimation, which in turn can improve species tree estimation. Experimental studies of these pipelines revealed that they helped in improving gene tree and species tree estimation. However, these studies only tested the weighted statistical binning and statistical binning pipelines using multi-locus bootstrapping (MLBS) and not using BestML, where MLBS and BestML are different ways to run a phylogenetic pipeline.In this thesis, a novel phylogenetic pipeline named WSB+WQMC is proposed. This pipeline shares several design features with the weighted statistical binning pipeline (referred as WSB+CAML in this thesis) but has some other desirable properties. The WSB+WQMC pipeline is also shown to be statistically consistent under the GTR+MSC model when a slightly different version of WQMC is used.In this study WSB+WQMC was evaluated and compared with the WSB+CAML pipeline on various simulated datasets using BestML analysis. Most of the trends seen in MLBS analyses were also observed for WSB+WQMC and WSB+CAML in BestML analyses with some important differences. It is shown that WSB+WQMC substantially improved the accuracy of gene tree and species tree estimation using ASTRAL2 and ASTRID on most datasets having low, medium, and moderately high levels of ILS. Compared to WSB+CAML, it was found that WSB+WQMC computed less accurate gene trees and species trees in certain model conditions having low and medium levels of ILS. However, WSB+WQMC was found to be better and at least as accurate as WSB+CAML in computing gene trees and species trees on all datasets having moderately high and high ILS levels. WSB+WQMC is also shown to be better in estimating gene trees on certain medium and low ILS datasets. Thus, WSB+WQMC is a potential alternative to WSB+CAML for gene tree and species tree estimation in the presence of low phylogenetic signal.

【 预 览 】
附件列表
Files Size Format View
Improving gene trees without more data 1007KB PDF download
  文献评价指标  
  下载次数:8次 浏览次数:5次