BMC Bioinformatics | |
Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length | |
Proceedings | |
Kwong-Sak Leung1  Bing Ni1  Leung-Yau Lo1  Stephen Kwok-Wing Tsui2  Shao-Ke Lou3  Aldrin Kay-Yuen Yim4  Jing-Woei Li4  Hao Qin4  Ting-Fung Chan4  | |
[1] Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR;School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR;Hong Kong Bioinformatics Center, The Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR;School of Life Sciences, The Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR;Hong Kong Bioinformatics Center, The Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR;School of Life Sciences, The Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR;Hong Kong Bioinformatics Center, The Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR; | |
关键词: Splice Site; Reference Genome; Splice Event; Splice Junction; Intron Length; | |
DOI : 10.1186/1471-2105-12-S5-S2 | |
来源: Springer | |
【 摘 要 】
BackgroundRNA sequencing (RNA-seq) measures gene expression levels and permits splicing analysis. Many existing aligners are capable of mapping millions of sequencing reads onto a reference genome. For reads that can be mapped to multiple positions along the reference genome (multireads), these aligners may either randomly assign them to a location, or discard them altogether. Either way could bias downstream analyses. Meanwhile, challenges remain in the alignment of reads spanning across splice junctions. Existing splicing-aware aligners that rely on the read-count method in identifying junction sites are inevitably affected by sequencing depths.ResultsThe distance between aligned positions of paired-end (PE) reads or two parts of a spliced read is dependent on the experiment protocol and gene structures. We here proposed a new method that employs an empirical geometric-tail (GT) distribution of intron lengths to make a rational choice in multireads selection and splice-sites detection, according to the aligned distances from PE and sliced reads.ConclusionsGT models that combine sequence similarity from alignment, and together with the probability of length distribution, could accurately determine the location of both multireads and spliced reads.
【 授权许可】
Unknown
© Lou et al; licensee BioMed Central Ltd. 2011. This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311090579330ZK.pdf | 915KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]