期刊论文详细信息
BMC Bioinformatics
PASTA: splice junction identification from RNA-Sequencing data
Shaojun Tang2  Alberto Riva1 
[1] University of Florida Genetics Institute, University of Florida, Gainesville, FL, USA
[2] Current address: Proteomics Center at Children's Hospital Boston, Boston, MA, USA
关键词: Computational analysis of alternative splicing;    Alternative splicing;    Next-generation sequencing;    RNA-Seq;   
Others  :  1087919
DOI  :  10.1186/1471-2105-14-116
 received in 2012-08-13, accepted in 2013-03-19,  发布年份 2013
PDF
【 摘 要 】

Background

Next generation transcriptome sequencing (RNA-Seq) is emerging as a powerful experimental tool for the study of alternative splicing and its regulation, but requires ad-hoc analysis methods and tools. PASTA (Patterned Alignments for Splicing and Transcriptome Analysis) is a splice junction detection algorithm specifically designed for RNA-Seq data, relying on a highly accurate alignment strategy and on a combination of heuristic and statistical methods to identify exon-intron junctions with high accuracy.

Results

Comparisons against TopHat and other splice junction prediction software on real and simulated datasets show that PASTA exhibits high specificity and sensitivity, especially at lower coverage levels. Moreover, PASTA is highly configurable and flexible, and can therefore be applied in a wide range of analysis scenarios: it is able to handle both single-end and paired-end reads, it does not rely on the presence of canonical splicing signals, and it uses organism-specific regression models to accurately identify junctions.

Conclusions

PASTA is a highly efficient and sensitive tool to identify splicing junctions from RNA-Seq data. Compared to similar programs, it has the ability to identify a higher number of real splicing junctions, and provides highly annotated output files containing detailed information about their location and characteristics. Accurate junction data in turn facilitates the reconstruction of the splicing isoforms and the analysis of their expression levels, which will be performed by the remaining modules of the PASTA pipeline, still under development. Use of PASTA can therefore enable the large-scale investigation of transcription and alternative splicing.

【 授权许可】

   
2013 Tang and Riva; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117055230411.pdf 349KB PDF download
Figure 2. 80KB Image download
Figure 1. 70KB Image download
【 图 表 】

Figure 1.

Figure 2.

【 参考文献 】
  • [1]Barash Y, Calarco JA, Gao W, Pan Q, Wang X, Shai O, Blencowe BJ, Frey BJ: Deciphering the splicing code. Nature 2010, 465:53-59.
  • [2]Wang G-S, Cooper TA: Splicing in disease: disruption of the splicing code and the decoding machinery. Nat Rev Genet 2007, 8:749-761.
  • [3]Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ: Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 2008, 40:1413-1415.
  • [4]Grant GR, Farkas MH, Pizarro AD, Lahens NF, Schug J, Brunk BP, Stoeckert CJ, Hogenesch JB, Pierce EA: Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM). Bioinformatics 2011, 27:2518-2528.
  • [5]Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008, 5:621-628.
  • [6]Shendure J: The beginning of the end for microarrays? Nat Methods 2008, 5:585-587.
  • [7]Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009, 10:R25. BioMed Central Full Text
  • [8]Burset M, Seledtsov IA, Solovyev VV: Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res 2000, 28:4364-4375.
  • [9]Graham JH, Robb DT, Poe AR: Random phenotypic variation of yeast (Saccharomyces cerevisiae) single-gene knockouts fits a double pareto-lognormal distribution. PLoS One 2012, 7(11):1-6.
  • [10]Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras J-B, Stephens M, Gilad Y, Pritchard JK: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 2010, 464:768-772.
  • [11]Aban IB, Meerschaert MM, Panorska AK: Parameter estimation for the truncated pareto distribution. J Am Stat Assoc 2006, 101:270-277.
  • [12]Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 2009, 25:1105-1111.
  • [13]Lee K-S, Smith K, Amieux PS, Wang EH: MBNL3/CHCR prevents myogenic differentiation by inhibiting MyoD-dependent gene transcription. Differentiation 2008, 76:299-309.
  • [14]Squillace RM, Chenault DM, Wang EH: Inhibition of muscle differentiation by the novel muscleblind-related protein CHCR. Dev Biol 2002, 250:218-230.
  文献评价指标  
  下载次数:37次 浏览次数:21次