PLoS One | |
HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data | |
Katherine Sorber1  Michelle T. Dimon2  Joseph L. DeRisi2  | |
[1] Biological and Medical Informatics Program, University of California San Francisco, San Francisco, California, United States of America;Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, California, United States of America | |
关键词: Introns; Hidden Markov models; Sequence alignment; Sequence motif analysis; Arabidopsis thaliana; Multiple alignment calculation; Alternative splicing; Plasmodium; | |
DOI : 10.1371/journal.pone.0013875 | |
学科分类:医学(综合) | |
来源: Public Library of Science | |
【 摘 要 】
Background High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown.Methodology/Principal Findings Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity.Conclusions/Significance HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on pre-built gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6% of 3′ splice sites and 1.4% of 5′ splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available at http://derisilab.ucsf.edu/software/hmmsplicer.
【 授权许可】
CC BY
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO201904023959749ZK.pdf | 1101KB | download |