期刊论文详细信息
BMC Bioinformatics
BCRgt: a Bayesian cluster regression-based genotyping algorithm for the samples with copy number alterations
Shengping Yang2  Xiangqin Cui1  Zhide Fang3 
[1] Department of Biostatistics, University of Alabama at Birmingham, 1665 University Blvd, Birmingham, AL 35294, USA
[2] Department of Pathology, School of Medicine, Texas Tech University Health Science Center, Lubbock, Texas, USA
[3] Biostatistics Program, School of Public Health, LSU Health Sciences Center, 2020 Gravier Street, New Orleans, LA 70115, USA
关键词: SNP array;    Genotyping;    Copy number alteration;    Bayesian cluster regression;   
Others  :  1087595
DOI  :  10.1186/1471-2105-15-74
 received in 2013-06-03, accepted in 2014-03-10,  发布年份 2014
PDF
【 摘 要 】

Background

Accurate genotype calling is a pre-requisite of a successful Genome-Wide Association Study (GWAS). Although most genotyping algorithms can achieve an accuracy rate greater than 99% for genotyping DNA samples without copy number alterations (CNAs), almost all of these algorithms are not designed for genotyping tumor samples that are known to have large regions of CNAs.

Results

This study aims to develop a statistical method that can accurately genotype tumor samples with CNAs. The proposed method adds a Bayesian layer to a cluster regression model and is termed a Bayesian Cluster Regression-based genotyping algorithm (BCRgt). We demonstrate that high concordance rates with HapMap calls can be achieved without using reference/training samples, when CNAs do not exist. By adding a training step, we have obtained higher genotyping concordance rates, without requiring large sample sizes. When CNAs exist in the samples, accuracy can be dramatically improved in regions with DNA copy loss and slightly improved in regions with copy number gain, comparing with the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM).

Conclusions

In conclusion, we have demonstrated that BCRgt can provide accurate genotyping calls for tumor samples with CNAs.

【 授权许可】

   
2014 Yang et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117022259490.pdf 1154KB PDF download
Figure 4. 94KB Image download
Figure 3. 84KB Image download
Figure 2. 30KB Image download
Figure 1. 61KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Lamy P, Grove J, Wiuf C: A review of software for microarray genotyping. Hum Genomics 2011, 5(4):304-309.
  • [2]Rabbee N: Speed TP A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics 2006, 22(1):7-12.
  • [3]Affymetrix: BRLMM: An Improved Genotype Calling Method for the GeneChip Human Mapping 500K Array Set. Technical Report, White Paper. Santa Clara, CA: Affymetrix, Inc; 2006.
  • [4]Affymetrix: BRLMM-P: A Genotype Calling Method for the SNP 5.0 Array. Technical Report, White Paper. Santa Clara, CA: Affymetrix, Inc; 2007.
  • [5]Korn JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, Cawley S, Hubbell E, Veitch J, Collins PJ, Darvishi K, Lee C, Nizzari MM, Gabriel SB, Purcell S, Daly MJ, Altshuler D: Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet 2008, 40(10):1253-1260.
  • [6]Li W, Lee A, Gregersen PK: Copy number variation region detection by cumulative plots. BMC Bioinforma 2009, 10(suppl 1):S67. BioMed Central Full Text
  • [7]Carvalho B, Bengtsson H, Speed TP, Irizarry RA: Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics 2007, 8(2):485-499.
  • [8]Hua J, Craig DW, Brun M, Webster J, Zismann V, Tembe W, Joshipura K, Huentelman MJ, Dougherty ER, Stephan DA: SNiPer-HD: improved genotype calling accuracy by an expectation-maximization algorithm for high-density SNP arrays. Bioinformatics 2007, 23(1):57-63.
  • [9]Wellcome Trust Case Control Consortium: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007, 447:661-678.
  • [10]Wright MH, Tung CW, Zhao K, Reynolds A, McCouch SR, Bustamante CD: ALCHEMY: a reliable method for automated SNP genotype calling for small batch sizes and highly homozygous populations. Bioinformatics 2010, 26(23):2952-2960.
  • [11]Giannoulatou E, Yau C, Colella S, Ragoussis J, Holmes CC: GenoSNP: a variational Bayes within-sample SNP genotyping algorithm that does not require a reference population. Bioinformatics 2008, 24(19):2209-2214.
  • [12]Li G, Gelernter J, Kranzler HR, Zhao H: M3: an improved SNP calling algorithm for Illumina BeadArray data. Bioinformatics 2012, 28(3):358-365.
  • [13]Shah TS, Liu JZ, Floyd JA, Morris JA, Wirth N, Barrett JC, Anderson CA: OptiCall: a robust genotype-calling algorithm for rare, low-frequency and common variants. Bioinformatics 2012, 28(12):1598-1603.
  • [14]Sun W, Wright FA, Tang Z, Nordgard SH, Van Loo P, Yu T, Kristensen VN, Perou CM: Integrated study of copy number states and genotype calls using high density SNP arrays. Nucleic Acids Res 2009, 37(16):5365-5377.
  • [15]Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 2004, 5(4):557-572.
  • [16]Li A, Liu Z, Lezon-Geyda K, Sarkar S, Lannin D, Schulz V, Krop I, Winer E, Harris L, Tuck D: GPHMM: an integrated hidden Markov model for identification of copy number alteration and loss of heterozygosity in complex tumor samples using whole genome SNP arrays. Nucleic Acids Res 2010, 39(12):4928-4941.
  • [17]Van Loo P, Nordgard SH, Lingjærde OC, Russnes HG, Rye IH, Sun W, Weigman VJ, Marynen P, Zetterberg A, Naume B, Perou CM, Børresen-Dale AL, Kristensen VN: Allele-specific copy number analysis of tumors. PNAS 2010, 107(39):16910-16915.
  • [18]Mullighan CG, Goorha S, Radtke I, Miller CB, Coustan-Smith E, Dalton JD, Girtman K, Mathew S, Ma J, Pounds SB, Su X, Pui CH, Relling MV, Evans WE, Shurtleff SA, Downing JR: Genome-wide analysis of genetic alterations in Acute Lymphoblastic Leukemia. Nature 2007, 446(7137):758-764.
  • [19]Pounds S, Cheng C, Mullighan C, Raimondi SC, Shurtleff S, Downing JR: Reference alignment of SNP microarray signals for copy number analysis of tumors. Bioinformatics 2009, 25(3):315-321.
  • [20]Dominici F, Parmigiani G, Clyde M: Conjugate analysis of multivariate normal data with incomplete observations. Can J Stat 2000, 28(3):533-550.
  • [21]Qin L, Self SG: The clustering of regression models method with applications in gene expression data. Biometrics 2006, 62(2):526-533.
  • [22]Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 1977, 39(1):1-38.
  • [23]Rice J: Bandwidth choice for nonparametric regression. Annu Stat 1984, 12(4):1215-1230.
  • [24]Huang J, Wei W, Zhang J, Liu G, Bignell GR, Stratton MR, Futreal PA, Wooster R, Jones KW, Shapero MH: Whole genome DNA copy number changes identified by high density oligonucleotide arrays. Hum Genomics 2004, 1(4):287-299. BioMed Central Full Text
  • [25]Hong H, Su Z, Ge W, Shi L, Perkins R, Fang H, Xu J, Chen JJ, Han T, Kaput J, Fuscoe JC, Tong W: Assessing batch effects of genotype calling algorithm BRLMM for the affymetrix GeneChip human mapping 500 K array set using 270 HapMap samples. BMC Bioinformatic 2008, 9(Suppl 9):S17. BioMed Central Full Text
  • [26]Walker BA, Leone PE, Chiecchio L, Dickens NJ, Jenner MW, Boyd KD, Johnson DC, Gonzalez D, Dagrada GP, Protheroe RK, Konn ZJ, Stockley DM, Gregory WM, Davies FE, Ross FM, Morgan GJ: A compendium of myeloma-associated chromosomal copy number abnormalities and their prognostic value. Blood 2010, 116(15):e56-e65.
  • [27]Yang S, Pounds S, Zhang K, Fang Z: PAIR: paired allelic log-intensity-ratio based normalization method for SNP-CGH arrays. Bioinformatics 2013, 29(3):299-307.
  • [28]Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, Hakonarson H, Bucan M: PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 2007, 17(11):1665-1674.
  • [29]Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, Bassett AS, Seller A, Holmes CC, Ragoussis J: QuantiSNP: an objective bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res 2007, 35(6):2013-2025.
  • [30]Yau C, Mouradov D, Jorissen RN, Colella S, Mirza G, Steers G, Harris A, Ragoussis J, Sieber O, Holmes CC: A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data. Genome Biol 2010, 11(9):R92-R92.
  • [31]Redon R, Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzalez JR, Gratacos M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, et al.: Global variation in copy number in the human genome. Nature 2006, 444(7118):445-454.
  文献评价指标  
  下载次数:35次 浏览次数:21次