期刊论文详细信息
Journal of computational biology: A journal of computational molecular cell biology
Quartet Based Gene Tree Imputation Using Deep Learning Improves Phylogenomic Analyses Despite Missing Data
article
Sazan Mahbub1  Shashata Sawmya1  Arpita Saha1  Rezwana Reaz1  M. Sohel Rahman1  Shamsuzzoha Bayzid1 
[1] Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology;Department of Computer Science, University of Maryland, College Park
关键词: gene tree;    gene tree discordance;    incomplete lineage sorting;    quartet consistency;    quartet distribution;    species tree;    missing data;    gene tree imputation;   
DOI  :  10.1089/cmb.2022.0212
学科分类:生物科学(综合)
来源: Mary Ann Liebert, Inc. Publishers
PDF
【 摘 要 】

Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, for a combination of reasons (ranging from sampling biases to more biological causes, as in gene birth and loss), gene trees are often incomplete, meaning that not all species of interest have a common set of genes. Incomplete gene trees can potentially impact the accuracy of phylogenomic inference. We, for the first time, introduce the problem of imputing the quartet distribution induced by a set of incomplete gene trees, which involves adding the missing quartets back to the quartet distribution. We present Quartet based Gene tree Imputation using Deep Learning (QT-GILD), an automated and specially tailored unsupervised deep learning technique, accompanied by cues from natural language processing, which learns the quartet distribution in a given set of incomplete gene trees and generates a complete set of quartets accordingly. QT-GILD is a general-purpose technique needing no explicit modeling of the subject system or reasons for missing data or gene tree heterogeneity. Experimental studies on a collection of simulated and empirical datasets suggest that QT-GILD can effectively impute the quartet distribution, which results in a dramatic improvement in the species tree accuracy. Remarkably, QT-GILD not only imputes the missing quartets but can also account for gene tree estimation error. Therefore, QT-GILD advances the state-of-the-art in species tree estimation from gene trees in the face of missing data.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202307010001634ZK.pdf 2405KB PDF download
  文献评价指标  
  下载次数:1次 浏览次数:1次