期刊论文详细信息
BMC Bioinformatics
Novel genetic matching methods for handling population stratification in genome-wide association studies
André Lacour4  Vitalia Schüller4  Dmitriy Drichel4  Christine Herold4  Frank Jessen2  Markus Leber3  Wolfgang Maier2  Markus M Noethen1  Alfredo Ramirez2  Tatsiana Vaitsiakhovich3  Tim Becker3 
[1] Institut für Humangenetik and Life & Brain Center, Universität Bonn, Sigmund-Freud-Str. 25, Bonn 53127, Germany
[2] Abteilung für Psychiatrie und Psychotherapie, Universitätsklinikum Bonn, Sigmund-Freud-Str. 25, Bonn 53127, Germany
[3] Institut für Medizinische Biometrie, Informatik und Epidemiologie, Universität Bonn, Sigmund-Freud-Str. 25, Bonn 53127, Germany
[4] German Center for Neurodegenerative Diseases (DZNE), Sigmund-Freud-Str. 25, Bonn 53127, Germany
关键词: structured association;    genetic matching;    population stratification;    Genome-wide association studies;   
Others  :  1139036
DOI  :  10.1186/s12859-015-0521-4
 received in 2014-07-11, accepted in 2015-02-27,  发布年份 2015
PDF
【 摘 要 】

Background

A usually confronted problem in association studies is the occurrence of population stratification. In this work, we propose a novel framework to consider population matchings in the contexts of genome-wide and sequencing association studies. We employ pairwise and groupwise optimal case-control matchings and present an agglomerative hierarchical clustering, both based on a genetic similarity score matrix. In order to ensure that the resulting matches obtained from the matching algorithm capture correctly the population structure, we propose and discuss two stratum validation methods. We also invent a decisive extension to the Cochran-Armitage Trend test to explicitly take into account the particular population structure.

Results

We assess our framework by simulations of genotype data under the null hypothesis, to affirm that it correctly controls for the type-1 error rate. By a power study we evaluate that structured association testing using our framework displays reasonable power. We compare our result with those obtained from a logistic regression model with principal component covariates. Using the principal components approaches we also find a possible false-positive association to Alzheimer’s disease, which is neither supported by our new methods, nor by the results of a most recent large meta analysis or by a mixed model approach.

Conclusions

Matching methods provide an alternative handling of confounding due to population stratification for statistical tests for which covariates are hard to model. As a benchmark, we show that our matching framework performs equally well to state of the art models on common variants.

【 授权许可】

   
2015 Lacour et al.; licensee BioMed Central.

【 预 览 】
附件列表
Files Size Format View
20150321011755219.pdf 635KB PDF download
Figure 1. 110KB Image download
【 图 表 】

Figure 1.

【 参考文献 】
  • [1]Hirschhorn JN, Daly MJ: Genome-wide association studies for common deseases and complex traits. Nat Rev Genet 2005, 6:95-108. doi:10.1038/nrg1521
  • [2]Bush WS, Moore JH: Chapter 11: genome-wide association studies. PLoS Comput Biol 2012, 8:e1002822. doi:10.1371/journal.pcbi.1002822
  • [3]Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al.: Finding the missing heritability of complex diseases. Nature 2009, 461:747-53. doi:10.1038/nature08494
  • [4]Rosenberg NA, Huang L, Jewett EM, Szpiech ZA, Jankovic I, Boehnke M: Genome-wide association studies in diverse populations. Nat Rev Genet 2010, 11:356-66. doi:10.1038/nrg2760
  • [5]Lee S, Abecasis GR, Boehnke M, Lin X: Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 2014, 95:5-23. doi:10.1016/j.ajhg.2014.06.009
  • [6]Knowler WC, Williams RC, Pettitt DJ, Steinberg AG: GM 3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am J Hum Genet 1988, 43:520-6.
  • [7]Lander ES, Schork N: Genetic dissection of complex traits. Science 1994, 265:2037-48.
  • [8]Edge MD, Gorroochurn P, Rosenberg NA: Windfalls and pitfalls: Applications of population genetics to the search for disease genes. Evol Med Public Health 2013, 2013:254-72. doi:10.1093/emph/eot021
  • [9]Li CC: Population subdivision with respect to multiple alleles. Ann Hum Genet 1969, 33:23-9. doi:10.1111/j.1469-1809.1969.tb01625.x
  • [10]Devlin B, Roeder K: Genomic control for association studies. Biometrics 1999, 55:997-1004. doi:10.1111/j.0006-341X.1999.00997.x
  • [11]Bacanu SA, Devlin B, Roeder K: The power of genomic control. Am J Hum Genet 2000, 66:1933-44.
  • [12]Yang J, Weedon MN, Purcell S, Lettre G, Estrada K, Willer CJ, et al.: Genomic inflation factors under polygenic inheritance. Eur J Hum Genet 2011, 1:807-12. doi:10.1038/ejhg.2011.39
  • [13]Epstein MP, Allen AS, Satten GA: A simple and improved correction for population stratification in case-control studies. Am J Hum Genet 2007, 80:921-30.
  • [14]Bouaziz M, Ambroise C, Guedj M: Accounting for population stratification in practice: a comparison of the main strategies dedicated to genome-wide association studies. PLoS One 2011, e28845:6. doi:10.1371/journal.pone.0028845
  • [15]Pritchard JK, Rosenberg NA: Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet 1999, 65:220-8. doi:10.1086/302449
  • [16]Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics 2000, 155:945-59.
  • [17]Lawson DJ, Hellenthal G, Myers S, Falush D: Inference of population structure using dense haplotype data. PLoS Genet 2012, 8:e1002453. doi:10.1371/journal.pgen.1002453
  • [18]Alexander DH, Novembre J, Lange K: Fast model-based estimation of ancestry in unrelated individuals. Genome Res 2009, 19:1655-64. doi:10.1101/gr.094052.109
  • [19]Solovieff N, Hartley SW, Baldwin CT, Perls TT, Steinberg MH, Sebastiani P: Clustering by genetic ancestry using genome-wide SNP data. BMC Genetics 2010, 11:108. doi:10.1186/1471-2156-11-108 BioMed Central Full Text
  • [20]Pearson K: On lines and planes of closest fit to systems of points in space. Phil Mag 1901, 2:559-72.
  • [21]Menozzi P, Piazza A, Cavalli-Sforza L: Synthetic maps of human gene frequencies in Europeans. Science 1978, 201:786-92. doi:10.1126/science.356262
  • [22]Kimmel G, Jordan MI, Halperin E, Shamir R, Karp RM: A randomization test for controlling population stratification in whole-genome association studies. Am J Hum Genet 2007, 81:895-905. doi:10.1086/521372
  • [23]Price AL, Zaitlen NA, Reich D, Patterson N: New approaches to population stratification in genome-wide association studies. Nat Rev Genet 2010, 11:459-63. doi:10.1038/nrg2813
  • [24]Lee AB, Luca D, Klei L, Devlin B, Roeder K: Discovering genetic ancestry using spectral graph theory. Genet Epidemiol 2010, 34:51-9. doi:10.1002/gepi.20434
  • [25]Hotelling H: Analysis of a complex of statistical variables into principal components. J Educ Psychol 1933, 24:417-41. doi:10.1037/h0071325
  • [26]Eckart C, Young G: The approximation of one matrix by another of lower rank. Psychometrika 1936, 1:211-8. doi:10.1007/BF02288367
  • [27]Torgerson WS: Theory & methods of scaling. Wiley, New York; 1958.
  • [28]Luca D, Ringquist S, Klei L, Lee AB, Gieger C, Wichmann HE, et al.: On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants. Am J Hum Gen 2008, 82:453-63. doi:10.1016/j.ajhg.2007.11.003
  • [29]Guan W, Liang L, Boehnke M, Abecasis GR: Genotype-based matching to correct for population stratification in large-scale case-control genetic association studies. Genet Epidemiol 2009, 33:508-17. doi:10.1002/gepi.20403
  • [30]Epstein MP, Duncan R, Broadaway KA, He M, Allen AS, Satten GA: Stratification-score matching improves correction for confounding by population stratification in case-control association studies. Genet Epidemiol 2012, 36:195-205. doi:10.1002/gepi.21611
  • [31]Fisher RA: The correlation between relatives on the supposition of Mendelian inheritance. Trans Earth Sci 1918, 52:399-433. doi:10.1017/S0080456800012163
  • [32]McLean RA, Sanders WL, Stroup WW: A unified approach to mixed linear models. Am Stat 1991, 45:54-64. doi:10.2307/2685241
  • [33]Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, et al.: A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 2006, 38:203-8. doi:10.1038/ng1702
  • [34]Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, et al.: Variance component model to account for sample structure in genome-wide association studies. Nat Genet 2010, 42:348-54. doi:10.1038/ng.548
  • [35]Zhou X, Stephens M: Genome-wide efficient mixed-model analysis for association studies. Nat Genet 2012, 44:821-4. doi:10.1038/ng.2310
  • [36]Li G, Zhu H: Genetic studies: the linear mixed models in genome-wide association studies. Open Bioinformatics J 2013, 7(Suppl-1, M2):27-33.
  • [37]Goldberger J, Tassa T.: A hierarchical clustering algorithm based on the Hungarian method. Pattern Recogn Lett 2008, 29:1632-8. doi:10.1016/j.patrec.2008.04.003
  • [38]Cochran WG: Some methods for strengthening the common χ2 tests. Biometrics 1954, 10:417-51.
  • [39]Armitage P: Tests for linear trends in proportions and frequencies. Biometrics 1955, 11:375-86. doi:10.2307/3001775
  • [40]Wegner P.: A technique for counting ones in a binary computer. Comm ACM 1960, 3:322. doi:10.1145/367236.367286
  • [41]Jacobi CGJ: De investigando ordine systematis aequationum differentialum vulgarium cujuscunque. Journal für die reine und angewandte Mathematik 1865, 64:297-320. published posthumely by Borchardt CW
  • [42]Kuhn HW: The Hungarian method for the assignment problem. Naval res Logist Quart 1955, 2:83-97.
  • [43]Munkres J: Algorithms for the assignment and transportation problems. J Soc Industr Appl Math 1957, 5:32-8.
  • [44]Tomizawa N: On some techniques useful for the solution of transportation problems. Networks 1971, 1:173-94.
  • [45]Edmonds J, Karp RM: Theoretical improvements in algorithmic efficiency for network flow problems. J ACM 1972, 19:248-64.
  • [46]Burkard R, Dell’Amico M, Martello S: Assignment problems. Society for Industrial and Applied Mathematics, Philadelphia, PA; 2009.
  • [47]Ming K, Rosenbaum PR: A note on optimal matching with variable controls using the assignment algorithm. J Comput Graphical Stat 2001, 10:455-63. doi:10.1198/106186001317114938
  • [48]Sun F, Li V, Diao Z: Modified bipartite matching for multiobjective optimization: application to antenna assignments in MIMO systems. IEEE Trans Wireless Comm 2009, 8:1349-55. doi:10.1109/TWC.2009.071351
  • [49]Ge Y, Dudoit S, Speed TP: Resampling-based multiple testing for microarray data analysis. Test 2003, 12:1-77.
  • [50]Herold C, Mattheisen M, Lacour A, Vaitsiakhovich T, Angisch M, Drichel D, et al.: Integrated genome-wide pathway association analysis with INTERSNP . Hum Hered 2012, 73:63-72. doi:10.1159/000336196
  • [51]Genomes Project Consortium: An integrated map of genetic variation from 1,092 human genomes Nature 2012, 491:56-65. doi:10.1038/nature11632
  • [52]Herold C, Ramirez A, Drichel D, Lacour A, Vaitsiakhovich T, Nöthen MM, et al.: A one-degree-of-freedom test for supra-multiplicativity of SNP effects. PLoS One 2013, e78038:8. doi:10.1371/journal.pone.0078038
  • [53]McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stadlan EM: Clinical diagnosis of Alzheimer’s disease: report of the NINCDS-ADRDA work group under the auspices of department of health and human services task force on Alzheimer’s disease. Neurology 1984, 34:939-44.
  • [54]Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al.: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007, 81:559-75. doi:10.1086/519795
  • [55]Purcell S, Chang C. plink 1.9. https://www.cog-genomics.org/plink2.
  • [56]Yang J, Lee SH, Goddard ME, Visscher PM: GCTA. a tool for genome-wide complex trait analysis. Am J Hum Genet 2011, 88:76-82. doi:10.1016/j.ajhg.2010.11.011
  • [57]European Alzheimer’s Disease Initiative: Genetic and Environmental Risk in Alzheimer’s Disease, Alzheimer’s Disease Genetic Consortium, Cohorts for Heart and Aging Research in Genomic Epidemiology. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease Nat Genet 2013, 45:1452-8. doi:10.1038/ng.2802
  • [58]Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006, 38:904-9. doi:10.1038/ng1847
  • [59]Campbell CD, Ogburn EL, Lunetta KL, Lyon HN, Freedman ML, Groop LC, et al.: Demonstrating stratification in a European American population. Nat Genet 2005, 37:868-72. doi:10.1038/ng1607
  • [60]Becker T, Drichel D, Herold C, Lacour A, Vaitsiakhovich T. INTERSNP - genome-wide interaction analysis software. http://intersnp.meb.uni-bonn.de.
  • [61]Herold C, Steffens M, Brockschmidt FF, Baur MP, Becker T: INTERSNP: Genome-wide interaction analysis guided by a priori information. Bioinformatics 2009, 25:3275-81. doi:10.1093/bioinformatics/btp596
  文献评价指标  
  下载次数:11次 浏览次数:12次