期刊论文详细信息
BMC Bioinformatics
Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm
Soyeon Ahn1  Haris Vikalo1 
[1] Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin 78712, Texas, USA
关键词: Genotype calling;    Deterministic sequential Monte Carlo;    Haplotype assembly;   
Others  :  1230726
DOI  :  10.1186/s12859-015-0651-8
 received in 2014-12-25, accepted in 2015-06-26,  发布年份 2015
【 摘 要 】

Background

Genetic variations predispose individuals to hereditary diseases, play important role in the development of complex diseases, and impact drug metabolism. The full information about the DNA variations in the genome of an individual is given by haplotypes, the ordered lists of single nucleotide polymorphisms (SNPs) located on chromosomes. Affordable high-throughput DNA sequencing technologies enable routine acquisition of data needed for the assembly of single individual haplotypes. However, state-of-the-art high-throughput sequencing platforms generate data that is erroneous, which induces uncertainty in the SNP and genotype calling procedures and, ultimately, adversely affect the accuracy of haplotyping. When inferring haplotype phase information, the vast majority of the existing techniques for haplotype assembly assume that the genotype information is correct. This motivates the development of methods capable of joint genotype calling and haplotype assembly.

Results

We present a haplotype assembly algorithm, ParticleHap, that relies on a probabilistic description of the sequencing data to jointly infer genotypes and assemble the most likely haplotypes. Our method employs a deterministic sequential Monte Carlo algorithm that associates single nucleotide polymorphisms with haplotypes by exhaustively exploring all possible extensions of the partial haplotypes. The algorithm relies on genotype likelihoods rather than on often erroneously called genotypes, thus ensuring a more accurate assembly of the haplotypes. Results on both the 1000 Genomes Project experimental data as well as simulation studies demonstrate that the proposed approach enables highly accurate solutions to the haplotype assembly problem while being computationally efficient and scalable, generally outperforming existing methods in terms of both accuracy and speed.

Conclusions

The developed probabilistic framework and sequential Monte Carlo algorithm enable joint haplotype assembly and genotyping in a computationally efficient manner. Our results demonstrate fast and highly accurate haplotype assembly aided by the re-examination of erroneously called genotypes.

A C code implementation of ParticleHap will be available for download from https://sites.google.com/site/asynoeun/particlehap.

【 授权许可】

   
2015 Ahn and Vikalo; licensee BioMed Central.

附件列表
Files Size Format View
Fig. 2. 24KB Image download
Fig. 1. 13KB Image download
Fig. 2. 24KB Image download
Fig. 1. 13KB Image download
【 图 表 】

Fig. 1.

Fig. 2.

Fig. 1.

Fig. 2.

【 参考文献 】
  • [1]Nielsen R, Paul JS, Albrechtsen A, Song YS. Genotype and snp calling from next-generation sequencing data. Nat Rev Genet. 2011; 12(6):443-51.
  • [2]Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM et al.. A map of human genome variation from population-scale sequencing. Nature. 2010; 467(7319):1061-73.
  • [3]Hoehe MR, Köpke K, Wendel B, Rohde K, Flachmeier C, Kidd KK et al.. Sequence variability and candidate gene analysis in complex disease: association of μ opioid receptor gene variation with substance dependence. Hum Mol Genet. 2000; 9(19):2895-908.
  • [4]Gibbs RA, Belmont JW, Hardenbol P, Willis TD, Yu F, Yang H et al.. The international hapmap project. Nature. 2003; 426(6968):789-96.
  • [5]Schwartz R et al.. Theory and algorithms for the haplotype assembly problem. Commun Inf Syst. 2010; 10(1):23-38.
  • [6]Lancia G, Bafna V, Istrail S, Lippert R, Schwartz R. Snps problems, complexity, and algorithms. Lecture Notes Comput Sci. 2001; 2161:182-93.
  • [7]Lippert R, Schwartz R, Lancia G, Istrail S. Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Brief Bioinform. 2002; 3(1):23-31.
  • [8]Cilibrasi R, van Iersel L, Kelk S, Tromp J. On the complexity of several haplotyping problems. Algorithms Bioinformatics. 2005; 3692:128-39.
  • [9]Panconesi A, Sozio M. Fast hare: A fast heuristic for single individual snp haplotype reconstruction. Algorithms Bioinformatics. 2004; 3240:266-77.
  • [10]Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP et al.. The diploid genome sequence of an individual human. PLoS Biol. 2007; 5(10):254.
  • [11]Chen Z, Fu B, Schweller R, Yang B, Zhao Z, Zhu B. Linear time probabilistic algorithms for the singular haplotype reconstruction problem from snp fragments. J Comput Biol. 2008; 15(5):535-46.
  • [12]Zhao YY, Wu LY, Zhang JH, Wang RS, Zhang XS. Haplotype assembly from aligned weighted snp fragments. Comput Biol Chem. 2005; 29(4):281-7.
  • [13]Wang Y, Feng E, Wang R. A clustering algorithm based on two distance functions for mec model. Comput Biol Chem. 2007; 31(2):148-50.
  • [14]Wang RS, Wu LY, Li ZP, Zhang XS. Haplotype reconstruction from snp fragments by minimum error correction. Bioinformatics. 2005; 21(10):2456-462.
  • [15]Geraci F. A comparison of several algorithms for the single individual snp haplotyping reconstruction problem. Bioinformatics. 2010; 26(18):2217-225.
  • [16]Bansal V, Halpern AL, Axelrod N, Bafna V. An mcmc algorithm for haplotype assembly from whole-genome sequence data. Genome Res. 2008; 18(8):1336-46.
  • [17]Bansal V, Bafna V. Hapcut: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics. 2008; 24(16):153-9.
  • [18]Duitama J, Huebsch T, McEwen G, Suk EK, Hoehe MR. Refhap: A reliable and fast algorithm for single individual haplotyping. Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, BCB ’10. ACM, New York, NY, USA; 2010.
  • [19]He D, Choi A, Pipatsrisawat K, Darwiche A, Eskin E. Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics. 2010; 26(12):183-90.
  • [20]Chen ZZ, Deng F, Wang L. Exact algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics. 2013; 29(16):349.
  • [21]Deng F, Cui W, Wang L. A highly accurate heuristic algorithm for the haplotype assembly problem. BMC Genomics. 2013; 14 Suppl 2:2.
  • [22]Xie M, Wang J, Jiang T. A fast and accurate algorithm for single individual haplotyping. BMC Syst Biol. 2012; 6 Suppl 2:8. BioMed Central Full Text
  • [23]Bayzid MS, Alam MM, Mueen A, Rahman MS. Hmec: A heuristic algorithm for individual haplotyping with minimum error correction. ISRN Bioinformatics. 2013; 2013:10.
  • [24]Aguiar D, Istrail S. Hapcompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data. J Comput Biol. 2012; 19(6):577-90.
  • [25]Aguiar D, Istrail S. Haplotype assembly in polyploid genomes and identical by descent shared tracts. Bioinformatics. 2013; 29(13):352-60.
  • [26]Li LM, Kim JH, Waterman MS. Haplotype reconstruction from snp alignment. J Comput Biol. 2004; 11(2-3):505-16.
  • [27]Kim JH, Waterman MS, Li LM. Diploid genome reconstruction of ciona intestinalis and comparative analysis with ciona savignyi. Genome Res. 2007; 17(7):1101-10.
  • [28]Matsumoto H, Kiryu H. Mixsih: a mixture model for single individual haplotyping. BMC Genomics. 2013; 14 Suppl 2:5.
  • [29]Arulampalam MS, Maskell S, Gordon N, Clapp T. A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. Signal Process IEEE Trans. 2002; 50(2):174-88.
  • [30]Fearnhead P. Sequential monte carlo methods in filter theory. Ph. D. thesis, University of Oxford. 1998.
  • [31]Punskaya E. Sequential monte carlo methods for digital communications. Ph. D. thesis, University of Cambridge. 2003.
  • [32]Liang KC, Wang X, Anastassiou D. A profile-based deterministic sequential monte carlo algorithm for motif discovery. Bioinformatics. 2008; 24(1):46-55.
  • [33]Liang K-C, Wang X. A deterministic sequential monte carlo method for haplotype inference. Selected Topics Signal Process IEEE J. 2008; 2(3):322-31.
  文献评价指标  
  下载次数:15次 浏览次数:17次