期刊论文详细信息
Evolutionary Bioinformatics
A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment
Valerio Freschi1 
关键词: duplications;    sequence alignment;    tandem repeat;    compression;   
DOI  :  10.4137/EBO.S9131
学科分类:生物技术
来源: Sage Journals
PDF
【 摘 要 】

In spite of the recognized importance of tandem duplications in genome evolution, commonly adopted sequence comparison algorithms do not take into account complex mutation events involving more than one residue at the time, since they are not compliant with the underlying assumption of statistical independence of adjacent residues. As a consequence, the presence of tandem repeats in sequences under comparison may impair the biological significance of the resulting alignment. Although solutions have been proposed, repeat-aware sequence alignment is still considered to be an open problem and new efficient and effective methods have been advocated. The present paper describes an alternative lossy compression scheme for genomic sequences which iteratively collapses repeats of increasing length. The resulting approximate representations do not contain tandem duplications, while retaining enough information for making their comparison even more significant than the edit distance between the original sequences. This allows us to exploit traditional alignment algorithms directly on the compressed sequences. Results confirm the validity of the proposed approach for the problem of duplication-aware sequence alignment.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO201904034345662ZK.pdf 797KB PDF download
  文献评价指标  
  下载次数:18次 浏览次数:26次