期刊论文详细信息
BMC Medical Genomics
Hybridization and amplification rate correction for affymetrix SNP arrays
Minghua Deng3  Lin Wan2  Minping Qian1  Peichao Peng1  Quan Wang4 
[1] LMAM, School of Mathematical Sciences, Peking University, Beijing, 100871, People's Republic of China;National Center for Mathematics and Interdisciplinary Sciences, and the Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, People's Republic of China;Center for Statistical Science, Peking University, Beijing, 100871, People's Republic of China;Center for Theoretical Biology, Peking University, Beijing, 100871, People's Republic of China
关键词: Genomic waves;    Cross-hybridization;    Copy number variation (CNV);    SNP array;   
Others  :  1134838
DOI  :  10.1186/1755-8794-5-24
 received in 2012-02-21, accepted in 2012-06-12,  发布年份 2012
PDF
【 摘 要 】

Background

Copy number variation (CNV) is essential to understand the pathology of many complex diseases at the DNA level. Affymetrix SNP arrays, which are widely used for CNV studies, significantly depend on accurate copy number (CN) estimation. Nevertheless, CN estimation may be biased by several factors, including cross-hybridization and training sample batch, as well as genomic waves of intensities induced by sequence-dependent hybridization rate and amplification efficiency. Since many available algorithms only address one or two of the three factors, a high false discovery rate (FDR) often results when identifying CNV. Therefore, we have developed a new CNV detection pipeline which is based on hybridization and amplification rate correction (CNVhac).

Methods

CNVhac first estimates the allelic concentrations (ACs) of target sequences by using the sample independent parameters trained through physicochemical hybridization law. Then the raw CN is estimated by taking the ratio of AC to the corresponding average AC from a reference sample set for one specific site. Finally, a hidden Markov model (HMM) segmentation process is implemented to detect CNV regions.

Results

Based on public HapMap data, the results show that CNVhac effectively smoothes the genomic waves and facilitates more accurate raw CN estimates compared to other methods. Moreover, CNVhac alleviates, to a certain extent, the sample dependence of inference and makes CNV calling with appreciable low FDRs.

Conclusion

CNVhac is an effective approach to address the common difficulties in SNP array analysis, and the working principles of CNVhac can be easily extended to other platforms.

【 授权许可】

   
2012 Wang et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150306080013892.pdf 574KB PDF download
Figure 4. 15KB Image download
Figure 3. 24KB Image download
Figure 2. 57KB Image download
Figure 1. 58KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Craddock N, Hurles ME, Cardin N, Pearson RD, Plagnol V, Robson S, Vukcevic D, Barnes C, Conrad DF, Giannoulatou E, et al.: Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 2010, 464:713-720.
  • [2]Grozeva D, Kirov G, Ivanov D, Jones IR, Jones L, Green EK, St Clair DM, Young AH, Ferrier N, Farmer AE, et al.: Rare copy number variants: a point of rarity in genetic risk for bipolar disorder and schizophrenia. Arch Gen Psychiatry 2010, 67:318-327.
  • [3]Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, et al.: Finding the missing heritability of complex diseases. Nature 2009, 461:747-753.
  • [4]McCarroll SA: Extending genome-wide association studies to copy-number variation. Hum Mol Genet 2008, 17:R135-R142.
  • [5]Zhang D, Qian Y, Akula N, Alliey-Rodriguez N, Tang J, Gershon ES, Liu C: Accuracy of CNV Detection from GWAS Data. PLoS One 2011, 6:e14511.
  • [6]Bengtsson H, Wirapati P, Speed TP: A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6. Bioinformatics 2009, 25:2149-2156.
  • [7]Clevert DA, Mitterecker A, Mayr A, Klambauer G, Tuefferd M, De Bondt A, Talloen W, Gohlmann H, Hochreiter S: cn.FARMS: a latent variable model to detect copy number variations in microarray data with a low false discovery rate. Nucleic Acids Res 2011, 39:e79.
  • [8]Medvedev P, Stanciu M, Brudno M: Computational methods for discovering structural variation with next-generation sequencing. Nat Methods 2009, 6:S13-S20.
  • [9]Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, Kitzman JO, Baker C, Malig M, Mutlu O, et al.: Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 2009, 41:1061-1067.
  • [10]Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, Tsalenko A, Sampas N, Bruhn L, Shendure J, Eichler EE: Diversity of human copy number variation and multicopy genes. Science 2010, 330:641-646.
  • [11]Alkan C, Coe BP, Eichler EE: Genome structural variation discovery and genotyping. Nat Rev Genet 2011, 12:363-376.
  • [12]Wang W, Wei Z, Lam TW, Wang J: Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions. Sci Rep 2011, 1:55.
  • [13]Korn JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, Cawley S, Hubbell E, Veitch J, Collins PJ, Darvishi K, et al.: Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet 2008, 40:1253-1260.
  • [14]Lin M, Wei LJ, Sellers WR, Lieberfarb M, Wong WH, Li C: dChipSNP: significance curve and clustering of SNP-array-based loss-of-heterozygosity data. Bioinformatics 2004, 20:1233-1240.
  • [15]Barnes C, Plagnol V, Fitzgerald T, Redon R, Marchini J, Clayton D, Hurles ME: A robust statistical method for case–control association testing with copy number variation. Nat Genet 2008, 40:1245-1252.
  • [16]Pique-Regi R, Ortega A, Asgharzadeh S: Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA. Bioinformatics 2009, 25:1223-1230.
  • [17]Carter NP: Methods and strategies for analyzing copy number variation using DNA microarrays. Nat Genet 2007, 39:S16-S21.
  • [18]Scherer SW, Lee C, Birney E, Altshuler DM, Eichler EE, Carter NP, Hurles ME, Feuk L: Challenges and standards in integrating surveys of structural variation. Nat Genet 2007, 39:S7-S15.
  • [19]Winchester L, Yau C, Ragoussis J: Comparing CNV detection methods for SNP arrays. Brief Funct Genomic Proteomic 2009, 8:353-366.
  • [20]Wan L, Sun K, Ding Q, Cui Y, Li M, Wen Y, Elston RC, Qian M, Fu WJ: Hybridization modeling of oligonucleotide SNP arrays for accurate DNA copy number estimation. Nucleic Acids Res 2009, 37:e117.
  • [21]Marioni JC, Thorne NP, Valsesia A, Fitzgerald T, Redon R, Fiegler H, Andrews TD, Stranger BE, Lynch AG, Dermitzakis ET, et al.: Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization. Genome Biol 2007, 8:R228.
  • [22]Diskin SJ, Li M, Hou C, Yang S, Glessner J, Hakonarson H, Bucan M, Maris JM, Wang K: Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucleic Acids Res 2008, 36:e126.
  • [23]van de Wiel MA, Picard F, van Wieringen WN, Ylstra B: Preprocessing and downstream analysis of microarray DNA copy number profiles. Brief Bioinform 2010, 12(1):10-21. http://bib.oxfordjournals.org/content/12/1/10.short
  • [24]Lander ES: Array of hope. Nat Genet 1999, 21:3-4.
  • [25]Johnson WE, Li C, Rabinovic A: Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 2007, 8:118-127.
  • [26]Hong H, Su Z, Ge W, Shi L, Perkins R, Fang H, Xu J, Chen JJ, Han T, Kaput J, et al.: Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples. BMC Bioinformatics 2008, 9(Suppl 9):S17.
  • [27]Held GA, Grinstein G, Tu Y: Modeling of DNA microarray data by using physical properties of hybridization. Proc Natl Acad Sci U S A 2003, 100:7575-7580.
  • [28]Held GA, Grinstein G, Tu Y: Relationship between gene expression and observed intensities in DNA microarrays–a modeling study. Nucleic Acids Res 2006, 34:e70.
  • [29]Hooyberghs J, Baiesi M, Ferrantini A, Carlon E: Breakdown of thermodynamic equilibrium for DNA hybridization in microarrays. Phys Rev E Stat Nonlin Soft Matter Phys 2010, 81:012901.
  • [30]Hooyberghs J, Van Hummelen P, Carlon E: The effects of mismatches on hybridization in DNA microarrays: determination of nearest neighbor parameters. Nucleic Acids Res 2009, 37:e53.
  • [31]Slater HR, Bailey DK, Ren H, Cao M, Bell K, Nasioulas S, Henke R, Choo KH, Kennedy GC: High-resolution identification of chromosomal abnormalities using oligonucleotide arrays containing 116,204 SNPs. Am J Hum Genet 2005, 77:709-726.
  • [32]Ono N, Suzuki S, Furusawa C, Agata T, Kashiwagi A, Shimizu H, Yomo T: An improved physico-chemical model of hybridization on high-density oligonucleotide microarrays. Bioinformatics 2008, 24:1278-1285.
  • [33]Pugh TJ, Delaney AD, Farnoud N, Flibotte S, Griffith M, Li HI, Qian H, Farinha P, Gascoyne RD, Marra MA: Impact of whole genome amplification on analysis of copy number variants. Nucleic Acids Res 2008, 36:e80.
  • [34]Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM: Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci U S A 2004, 101:9309-9314.
  • [35]Alter O, Brown PO, Botstein D: Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci U S A 2000, 97:10101-10106.
  • [36]Benito M, Parker J, Du Q, Wu J, Xiang D, Perou CM, Marron JS: Adjustment of systematic microarray data biases. Bioinformatics 2004, 20:105-114.
  • [37]The International HapMap Consortium: A second generation human haplotype map of over 3.1 million SNPs. Nature 2007, 449:851-861.
  • [38]Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, et al.: Origins and functional impact of copy number variation in the human genome. Nature 2010, 464:704-712.
  • [39]Zhang L, Wu C, Carta R, Zhao H: Free energy of DNA duplex formation on short oligonucleotide microarrays. Nucleic Acids Res 2007, 35:e18.
  • [40]Zhang L, Miles MF, Aldape KD: A model of molecular interactions on short oligonucleotide microarrays. Nat Biotechnol 2003, 21:818-821.
  • [41]Greenman CD, Bignell G, Butler A, Edkins S, Hinton J, Beare D, Swamy S, Santarius T, Chen L, Widaa S, et al.: PICNIC: an algorithm to predict absolute allelic copy number variation with microarray cancer data. Biostatistics 2010, 11:164-175.
  • [42]Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, Hakonarson H, Bucan M: PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 2007, 17:1665-1674.
  • [43]Rabiner LR: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 1989, 77:257-286.
  • [44]Bengtsson H, Irizarry R, Carvalho B, Speed TP: Estimation and assessment of raw copy numbers at the single locus level. Bioinformatics 2008, 24:759-767.
  • [45]McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A, Shapero MH, de Bakker PI, Maller JB, Kirby A, et al.: Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 2008, 40:1166-1174.
  • [46]Mulders GC, Barkema GT, Carlon E: Inverse Langmuir method for oligonucleotide microarray analysis. BMC Bioinformatics 2009, 10:64.
  • [47]Girirajan S, Eichler EE: De novo CNVs in bipolar disorder: recurrent themes or new directions? Neuron 2011, 72:885-887.
  • [48]Kaminsky EB, Kaul V, Paschall J, Church DM, Bunke B, Kunig D, Moreno-De-Luca D, Moreno-De-Luca A, Mulle JG, Warren ST, et al.: An evidence-based approach to establish the functional and clinical significance of copy number variants in intellectual and developmental disabilities. Genet Med 2011, 13:777-784.
  • [49]Sanders SJ, Ercan-Sencicek AG, Hus V, Luo R, Murtha MT, Moreno-De-Luca D, Chu SH, Moreau MP, Gupta AR, Thomson SA, et al.: Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron 2011, 70:863-885.
  • [50]Malhotra D, McCarthy S, Michaelson JJ, Vacic V, Burdick KE, Yoon S, Cichon S, Corvin A, Gary S, Gershon ES, et al.: High frequencies of de novo CNVs in bipolar disorder and schizophrenia. Neuron 2011, 72:951-963.
  • [51]Malhotra D, Sebat J: CNVs: Harbingers of a Rare Variant Revolution in Psychiatric Genetics. Cell 2012, 148:1223-1241.
  • [52]Itsara A, Cooper GM, Baker C, Girirajan S, Li J, Absher D, Krauss RM, Myers RM, Ridker PM, Chasman DI, et al.: Population analysis of large copy number variants and hotspots of human genetic disease. Am J Hum Genet 2009, 84:148-161.
  文献评价指标  
  下载次数:6次 浏览次数:4次