BMC Bioinformatics | |
Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model | |
Research Article | |
Rahul Siddharthan1  Gayathri Jayaraman1  | |
[1] The Institute of Mathematical Sciences, Taramani, 600 113, Chennai, India; | |
关键词: Synthetic Data; Local Alignment; Background Model; Functional Model; Orthologous Sequence; | |
DOI : 10.1186/1471-2105-11-464 | |
received in 2010-03-30, accepted in 2010-09-16, 发布年份 2010 | |
来源: Springer | |
【 摘 要 】
BackgroundWhile most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of homology and not similarity, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a p-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the p-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence.ResultsWe demonstrate that, on real and synthetic data, Sigma-2 significantly outperforms other programs in specificity to genuine homology (that is, it minimises alignment of spuriously similar regions that do not have a common ancestry) while it is now as sensitive as the best current programs.ConclusionsComparing these results with an extrapolation of the best results from other available programs, we suggest that conservation rates in intergenic DNA are often significantly over-estimated. It is increasingly important to align non-coding DNA correctly, in regulatory genomics and in the context of whole-genome alignment, and Sigma-2 is an important step in that direction.
【 授权许可】
CC BY
© Jayaraman and Siddharthan; licensee BioMed Central Ltd. 2010
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311102584189ZK.pdf | 817KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
- [41]
- [42]