| BMC Bioinformatics | |
| Oculus: faster sequence alignment by streaming read compression | |
| Software | |
| Matthew K Iyer1  Brendan A Veeneman1  Arul M Chinnaiyan2  | |
| [1] Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 48109, Ann Arbor, MI, USA;Michigan Center for Translational Pathology, University of Michigan Medical School, 48109, Ann Arbor, MI, USA;Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 48109, Ann Arbor, MI, USA;Michigan Center for Translational Pathology, University of Michigan Medical School, 48109, Ann Arbor, MI, USA;Department of Pathology, University of Michigan Medical School, 48109, Ann Arbor, MI, USA;Howard Hughes Medical Institute, University of Michigan Medical School, 48109, Ann Arbor, MI, USA;Department of Urology, University of Michigan Medical School, 48109, Ann Arbor, MI, USA; | |
| 关键词: DNA nucleotide sequence alignment streaming identity redundancy compression software algorithm; | |
| DOI : 10.1186/1471-2105-13-297 | |
| received in 2012-04-10, accepted in 2012-11-01, 发布年份 2012 | |
| 来源: Springer | |
PDF
|
|
【 摘 要 】
BackgroundDespite significant advancement in alignment algorithms, the exponential growth of nucleotide sequencing throughput threatens to outpace bioinformatic analysis. Computation may become the bottleneck of genome analysis if growing alignment costs are not mitigated by further improvement in algorithms. Much gain has been gleaned from indexing and compressing alignment databases, but many widely used alignment tools process input reads sequentially and are oblivious to any underlying redundancy in the reads themselves.ResultsHere we present Oculus, a software package that attaches to standard aligners and exploits read redundancy by performing streaming compression, alignment, and decompression of input sequences. This nearly lossless process (> 99.9%) led to alignment speedups of up to 270% across a variety of data sets, while requiring a modest amount of memory. We expect that streaming read compressors such as Oculus could become a standard addition to existing RNA-Seq and ChIP-Seq alignment pipelines, and potentially other applications in the future as throughput increases.ConclusionsOculus efficiently condenses redundant input reads and wraps existing aligners to provide nearly identical SAM output in a fraction of the aligner runtime. It includes a number of useful features, such as tunable performance and fidelity options, compatibility with FASTA or FASTQ files, and adherence to the SAM format. The platform-independent C++ source code is freely available online, at http://code.google.com/p/oculus-bio.
【 授权许可】
Unknown
© Veeneman et al.; licensee BioMed Central Ltd. 2012. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202311099604023ZK.pdf | 666KB |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
PDF