BMC Genomics | |
Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome | |
Methodology Article | |
Kraig R Stevenson1  Patricia J Wittkopp2  Joseph D Coolon3  | |
[1] Department of Computational Medicine and Bioinformatics, University of Michigan, 48109, Ann Arbor, MI, USA;Department of Computational Medicine and Bioinformatics, University of Michigan, 48109, Ann Arbor, MI, USA;Department of Ecology and Evolutionary Biology, University of Michigan, 48109, Ann Arbor, MI, USA;Department of Molecular, Cellular, and Developmental Biology, University of Michigan, 830 North University Avenue, 48109, Ann Arbor, MI, USA;Department of Ecology and Evolutionary Biology, University of Michigan, 48109, Ann Arbor, MI, USA; | |
关键词: Next-generation sequencing; Mapping bias; Drosophila melanogaster; Drosophila simulans; DGRP; Allelic imbalance; Genomics; Gene expression; Illumina; | |
DOI : 10.1186/1471-2164-14-536 | |
received in 2013-02-14, accepted in 2013-08-05, 发布年份 2013 | |
来源: Springer | |
【 摘 要 】
BackgroundRNA-seq can be used to measure allele-specific expression (ASE) by assigning sequence reads to individual alleles; however, relative ASE is systematically biased when sequence reads are aligned to a single reference genome. Aligning sequence reads to both parental genomes can eliminate this bias, but this approach is not always practical, especially for non-model organisms. To improve accuracy of ASE measured using a single reference genome, we identified properties of differentiating sites responsible for biased measures of relative ASE.ResultsWe found that clusters of differentiating sites prevented sequence reads from an alternate allele from aligning to the reference genome, causing a bias in relative ASE favoring the reference allele. This bias increased with greater sequence divergence between alleles. Increasing the number of mismatches allowed when aligning sequence reads to the reference genome and restricting analysis to genomic regions with fewer differentiating sites than the number of mismatches allowed almost completely eliminated this systematic bias. Accuracy of allelic abundance was increased further by excluding differentiating sites within sequence reads that could not be aligned uniquely within the genome (imperfect mappability) and reads that overlapped one or more insertions or deletions (indels) between alleles.ConclusionsAfter aligning sequence reads to a single reference genome, excluding differentiating sites with at least as many neighboring differentiating sites as the number of mismatches allowed, imperfect mappability, and/or an indel(s) nearby resulted in measures of allelic abundance comparable to those derived from aligning sequence reads to both parental genomes.
【 授权许可】
Unknown
© Stevenson et al.; licensee BioMed Central Ltd. 2013. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311108923346ZK.pdf | 1094KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]