期刊论文详细信息
Algorithms for Molecular Biology
Heuristic algorithms for best match graph editing
Marc Hellmuth1  David Schaller2  Peter F. Stadler3  Manuela Geiß4 
[1] Department of Mathematics, Faculty of Science, Stockholm University, SE-10691, Stockholm, Sweden;Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04109 Leipzig, Leipzig, Germany;Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, D-04107, Leipzig, Germany;Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04109 Leipzig, Leipzig, Germany;Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, D-04107, Leipzig, Germany;Competence Center for Scalable Data Services and Solutions Dresden/Leipzig, Interdisciplinary Center for Bioinformatics, German Centre for Integrative Biodiversity Research (iDiv), and Leipzig Research Center for Civilization Diseases, Universität Leipzig, Augustusplatz 12, D-04107, Leipzig, Germany;Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, A-1090, Vienna, Austria;Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Ciudad Universitaria, COL-111321, Bogotá, D.C., Colombia;Santa Fe Institute, 1399 Hyde Park Rd., NM87501, Santa Fe, USA;Software Competence Center Hagenberg GmbH, Softwarepark 21, A-4232, Hagenberg, Austria;
关键词: Arc modification problems;    Heuristic algorithm;    Consistent algorithm;    NP-hardness;   
DOI  :  10.1186/s13015-021-00196-3
来源: Springer
PDF
【 摘 要 】

BackgroundBest match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Empirical estimates thus will usually violate the theoretical properties of BMGs. The corresponding graph editing problem can be used to guide error correction for best match data. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data.ResultsSince BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho’s supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing.ConclusionNoisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202109173513755ZK.pdf 4181KB PDF download
  文献评价指标  
  下载次数:9次 浏览次数:8次