| BMC Genomics | |
| Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms | |
| Research Article | |
| Robert H S Kraus1  Maja P Greminger2  Alexander Nater2  Carel P van Schaik2  Michael Krützen2  Natasha Arora2  Ian Singleton3  Andrea Patrignani4  Rémy Bruggmann5  Beatrice Nussberger6  Benoit Goossens7  Reeta Sharma8  Lounes Chikhi9  Laurentius N Ambu1,10  Kai N Stölting1,11  | |
| [1] Conservation Genetics Group, Senckenberg Research Institute and Natural History Museum, Gelnhausen, Germany;Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, Zurich, Switzerland;Foundation for a Sustainable Ecosystem (YEL), Medan, Indonesia;PanEco, Foundation for Sustainable Development and Intercultural Exchange, Berg am Irchel, Switzerland;Functional Genomics Center, University of Zurich, Zurich, Switzerland;Functional Genomics Center, University of Zurich, Zurich, Switzerland;Department of Biology, University of Berne, Berne, Switzerland;Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland;Organisms and Environment Division, School of Biosciences, Cardiff University, Cardiff, UK;Danau Girang Field Centre, c/o Sabah Wildlife Department, Kota Kinabalu, Sabah, Malaysia;Sabah Wildlife Department, Kota Kinabalu, Sabah, Malaysia;Population and Conservation Genetics, Instituto Gulbenkian de Ciencia, Oeiras, Portugal;Population and Conservation Genetics, Instituto Gulbenkian de Ciencia, Oeiras, Portugal;CNRS, Laboratoire Evolution and Diversité Biologique, Université Paul Sabatier, Toulouse, France;Université de Toulouse, Toulouse, France;Sabah Wildlife Department, Kota Kinabalu, Sabah, Malaysia;Unit of Ecology & Evolution, Department of Biology, University of Fribourg, Fribourg, Switzerland; | |
| 关键词: Next-generation sequencing; Single-nucleotide polymorphisms; Reduced-representation libraries; Bioinformatics; GATK; SAMtools; CLC genomics workbench; Great apes; | |
| DOI : 10.1186/1471-2164-15-16 | |
| received in 2013-04-08, accepted in 2013-12-21, 发布年份 2014 | |
| 来源: Springer | |
PDF
|
|
【 摘 要 】
BackgroundHigh-throughput sequencing has opened up exciting possibilities in population and conservation genetics by enabling the assessment of genetic variation at genome-wide scales. One approach to reduce genome complexity, i.e. investigating only parts of the genome, is reduced-representation library (RRL) sequencing. Like similar approaches, RRL sequencing reduces ascertainment bias due to simultaneous discovery and genotyping of single-nucleotide polymorphisms (SNPs) and does not require reference genomes. Yet, generating such datasets remains challenging due to laboratory and bioinformatical issues. In the laboratory, current protocols require improvements with regards to sequencing homologous fragments to reduce the number of missing genotypes. From the bioinformatical perspective, the reliance of most studies on a single SNP caller disregards the possibility that different algorithms may produce disparate SNP datasets.ResultsWe present an improved RRL (iRRL) protocol that maximizes the generation of homologous DNA sequences, thus achieving improved genotyping-by-sequencing efficiency. Our modifications facilitate generation of single-sample libraries, enabling individual genotype assignments instead of pooled-sample analysis. We sequenced ~1% of the orangutan genome with 41-fold median coverage in 31 wild-born individuals from two populations. SNPs and genotypes were called using three different algorithms. We obtained substantially different SNP datasets depending on the SNP caller. Genotype validations revealed that the Unified Genotyper of the Genome Analysis Toolkit and SAMtools performed significantly better than a caller from CLC Genomics Workbench (CLC). Of all conflicting genotype calls, CLC was only correct in 17% of the cases. Furthermore, conflicting genotypes between two algorithms showed a systematic bias in that one caller almost exclusively assigned heterozygotes, while the other one almost exclusively assigned homozygotes.ConclusionsOur enhanced iRRL approach greatly facilitates genotyping-by-sequencing and thus direct estimates of allele frequencies. Our direct comparison of three commonly used SNP callers emphasizes the need to question the accuracy of SNP and genotype calling, as we obtained considerably different SNP datasets depending on caller algorithms, sequencing depths and filtering criteria. These differences affected scans for signatures of natural selection, but will also exert undue influences on demographic inferences. This study presents the first effort to generate a population genomic dataset for wild-born orangutans with known population provenance.
【 授权许可】
CC BY
© Greminger et al.; licensee BioMed Central Ltd. 2014
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202311098622486ZK.pdf | 1178KB |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
- [41]
- [42]
- [43]
- [44]
- [45]
- [46]
- [47]
- [48]
- [49]
- [50]
- [51]
- [52]
- [53]
- [54]
- [55]
- [56]
- [57]
- [58]
- [59]
- [60]
- [61]
- [62]
- [63]
- [64]
- [65]
- [66]
- [67]
- [68]
- [69]
- [70]
- [71]
- [72]
- [73]
PDF