| BMC Bioinformatics | |
| WinHAP2: an extremely fast haplotype phasing program for long genotype sequences | |
| Weihua Pan1  Yanan Zhao1  Yun Xu1  Fengfeng Zhou2  | |
| [1] Anhui Province-MOST Co-Key Laboratory of High Performance Computing and Its Application, University of Science and Technology of China, Hefei, Anhui 230027, P.R. China | |
| [2] Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, P.R. China | |
| 关键词: Parallel computing; Long sequence; SNP; Genotype; Haplotype phasing; | |
| Others : 818517 DOI : 10.1186/1471-2105-15-164 |
|
| received in 2013-10-23, accepted in 2014-05-14, 发布年份 2014 | |
PDF
|
|
【 摘 要 】
Background
The haplotype phasing problem tries to screen for phenotype associated genomic variations from millions of candidate data. Most of the current computer programs handle this problem with high requirements of computing power and memory. By replacing the computation-intensive step of constructing the maximum spanning tree with a heuristics of estimated initial haplotype, we released the WinHAP algorithm version 1.0, which outperforms the other algorithms in terms of both running speed and overall accuracy.
Results
This work further speeds up the WinHAP algorithm to version 2.0 (WinHAP2) by utilizing the divide-and-conquer strategy and the OpenMP parallel computing mode. WinHAP2 can phase 500 genotypes with 1,000,000 SNPs using just 12.8 MB in memory and 2.5 hours on a personal computer, whereas the other programs require unacceptable memory or running times. The parallel running mode further improves WinHAP2's running speed with several orders of magnitudes, compared with the other programs, including Beagle, SHAPEIT2 and 2SNP.
Conclusions
WinHAP2 is an extremely fast haplotype phasing program which can handle a large-scale genotyping study with any number of SNPs in the current literature and at least in the near future.
【 授权许可】
2014 Pan et al.; licensee BioMed Central Ltd.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20140711104928996.pdf | 860KB | ||
| Figure 5. | 31KB | Image | |
| Figure 4. | 39KB | Image | |
| Figure 3. | 87KB | Image | |
| Figure 2. | 70KB | Image | |
| Figure 1. | 42KB | Image |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
【 参考文献 】
- [1]Selvaraj S, Dixon RJ, Bansal V, Ren B: Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nat Biotechnol 2013, 31(12):1111-1118.
- [2]Adey A, Burton JN, Kitzman JO, Hiatt JB, Lewis AP, Martin BK, Qiu R, Lee C, Shendure J: The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line. Nature 2013, 500(7461):207-211.
- [3]Gusfield D: Haplotyping as perfect phylogeny: conceptual framework and efficient solutions. In Proceedings of the Sixth Annual International Conference on Computational Biology. New York: ACM; 2002:166-175.
- [4]Gusfield D: Haplotype inference by pure parsimony. In Combinatorial Pattern Matching. Berlin Heidelberg: Springer; 2003:144-155.
- [5]Excoffier L, Slatkin M: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 1995, 12(5):921-927.
- [6]Wang L, Xu Y: Haplotype inference by maximum parsimony. Bioinformatics 2003, 19(14):1773-1780.
- [7]Bonizzoni P, Della Vedova G, Dondi R, Li J: The haplotyping problem: an overview of computational models and solutions. J Comput Sci Technol 2003, 18(6):675-688.
- [8]Lancia G, Pinotti MC, Rizzi R: Haplotyping populations by pure parsimony: Complexity of exact and approximation algorithms. INFORMS J On Computing 2004, 16(4):348-359.
- [9]Chung RH, Gusfield D: Perfect phylogeny haplotyper: haplotype inferral using a tree model. Bioinformatics 2003, 19(6):780-781.
- [10]Bafna V, Gusfield D, Hannenhalli S, Yooseph S: A note on efficient computation of haplotypes via perfect phylogeny. J Comput Biol 2004, 11(5):858-866.
- [11]Jajamovich GH, Wang X: Maximum-parsimony haplotype inference based on sparse representations of genotypes. Signal Processing, IEEE Transactions on 2012, 60(4):2013-2023.
- [12]Qin ZS, Niu T, Liu JS: Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. Am J Hum Genet 2002, 71(5):1242-1247.
- [13]Kimmel G, Shamir R: GERBIL: Genotype resolution and block identification using likelihood. Proc Natl Acad Sci U S A 2005, 102(1):158-162.
- [14]Zhao Y, Xu Y, Wang Z, Zhang H, Chen G: A better block partition and ligation strategy for individual haplotyping. Bioinformatics 2008, 24(23):2720-2725.
- [15]Niu T, Qin ZS, Xu X, Liu JS: Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am J Hum Genet 2002, 70(1):157-169.
- [16]Browning SR, Browning BL: Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 2007, 81(5):1084-1097.
- [17]Williams AL, Patterson N, Glessner J, Hakonarson H, Reich D: Phasing of many thousands of genotyped samples. Am J Hum Genet 2012, 91(2):238-251.
- [18]Delaneau O, Marchini J, Zagury JF: A linear complexity phasing method for thousands of genomes. Nat Methods 2012, 9(2):179-181.
- [19]Delaneau O, Zagury JF, Marchini J: Improved whole-chromosome phasing for disease and population genetic studies. Nat Methods 2013, 10(1):5-6.
- [20]Brinza D, Zelikovsky A: 2SNP: scalable phasing based on 2-SNP haplotypes. Bioinformatics 2006, 22(3):371-373.
- [21]Xu Y, Cheng W, Nie P, Zhou F: WinHAP: an efficient haplotype phasing algorithm based on scalable sliding windows. PLoS One 2012, 7(8):e43163.
- [22]Rieder MJ, Taylor SL, Clark AG, Nickerson DA: Sequence variation in the human angiotensin converting enzyme. Nat Genet 1999, 22(1):59-62.
- [23]Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES: High-resolution haplotype structure in the human genome. Nat Genet 2001, 29(2):229-232.
- [24]Kerem B, Rommens JM, Buchanan JA, Markiewicz D, Cox TK, Chakravarti A, Buchwald M, Tsui LC: Identification of the cystic fibrosis gene: genetic analysis. Science 1989, 245(4922):1073-1080.
- [25]International HapMap 3 Consortium: Integrating common and rare genetic variation in diverse human populations. Nature 2010, 467(7311):52-58.
- [26]Hudson RR: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 2002, 18(2):337-338.
- [27]Delaneau O, Coulonges C, Boelle PY, Nelson G, Spadoni JL, Zagury JF: ISHAPE: new rapid and accurate software for haplotyping. BMC Bioinformatics 2007, 8:205. BioMed Central Full Text
- [28]Marchini J, Cutler D, Patterson N, Stephens M, Eskin E, Halperin E, Lin S, Qin ZS, Munro HM, Abecasis GR, Donnelly P, for the International HapMap Consortium: A comparison of phasing algorithms for trios and unrelated individuals. Am J Hum Genet 2006, 78(3):437-450.
- [29]Larsson E, Wahlstrand B, Hedblad B, Hedner T, Kjeldsen SE, Melander O, Lindahl P: Hypertension and genetic variation in endothelial-specific genes. PLoS One 2013, 8(4):e62035.
PDF