期刊论文详细信息
BioData Mining
A classification and characterization of two-locus, pure, strict, epistatic models for simulation and detection
Ryan J Urbanowicz1  Ambrose LS Granizo-Mackenzie1  Jeff Kiralis1  Jason H Moore1 
[1] Department of Genetics, Dartmouth College, 1 Medical Center Dr., Lebanon, NH 05001, USA
关键词: Convex hull;    Computational geometry;    GAMETES;    Genetics;    Simulation;    Models;    Epistasis;   
Others  :  1084070
DOI  :  10.1186/1756-0381-7-8
 received in 2013-11-05, accepted in 2014-05-23,  发布年份 2014
PDF
【 摘 要 】

Background

The statistical genetics phenomenon of epistasis is widely acknowledged to confound disease etiology. In order to evaluate strategies for detecting these complex multi-locus disease associations, simulation studies are required. The development of the GAMETES software for the generation of complex genetic models, has provided the means to randomly generate an architecturally diverse population of epistatic models that are both pure and strict, i.e. all n loci, but no fewer, are predictive of phenotype. Previous theoretical work characterizing complex genetic models has yet to examine pure, strict, epistasis which should be the most challenging to detect. This study addresses three goals: (1) Classify and characterize pure, strict, two-locus epistatic models, (2) Investigate the effect of model ‘architecture’ on detection difficulty, and (3) Explore how adjusting GAMETES constraints influences diversity in the generated models.

Results

In this study we utilized a geometric approach to classify pure, strict, two-locus epistatic models by “shape”. In total, 33 unique shape symmetry classes were identified. Using a detection difficulty metric, we found that model shape was consistently a significant predictor of model detection difficulty. Additionally, after categorizing shape classes by the number of edges in their shape projections, we found that this edge number was also significantly predictive of detection difficulty. Analysis of constraints within GAMETES indicated that increasing model population size can expand model class coverage but does little to change the range of observed difficulty metric scores. A variable population prevalence significantly increased the range of observed difficulty metric scores and, for certain constraints, also improved model class coverage.

Conclusions

These analyses further our theoretical understanding of epistatic relationships and uncover guidelines for the effective generation of complex models using GAMETES. Specifically, (1) we have characterized 33 shape classes by edge number, detection difficulty, and observed frequency (2) our results support the claim that model architecture directly influences detection difficulty, and (3) we found that GAMETES will generate a maximally diverse set of models with a variable population prevalence and a larger model population size. However, a model population size as small as 1,000 is likely to be sufficient.

【 授权许可】

   
2014 Urbanowicz et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150113144115260.pdf 1107KB PDF download
Figure 6. 51KB Image download
Figure 5. 42KB Image download
Figure 4. 98KB Image download
Figure 3. 92KB Image download
Figure 2. 93KB Image download
Figure 1. 20KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]Cordell H: Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum Mol Genet 2002, 11(20):2463.
  • [2]Bateson W, Mendel G: Mendel’s principles of heredity. Putnam’s 1909.
  • [3]Fisher R: The correlation between relatives on the supposition of mendelian inheritance. Trans R Soc Edinburgh 1918, 52:399-433.
  • [4]Cheverud J, Routman E: Epistasis and its contribution to genetic variance components. Genetics 1995, 139(3):1455.
  • [5]Frankel W, Schork N: Who’s afraid of epistasis? Nat Genet 1996, 14(4):371-373.
  • [6]Phillips P: The language of gene interaction. Genetics 1998, 149(3):1167.
  • [7]Wade M, Winther R, Agrawal A, Goodnight C: Alternative definitions of epistasis: dependence and interaction. Trends Ecol & Evol 2001, 16(9):498-504.
  • [8]Moore J, Williams S: Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. Bioessays 2005, 27(6):637-646.
  • [9]Moore J, Williams S: Epistasis and its implications for personal genetics. Am J Hum Genet 2009, 85(3):309-320.
  • [10]Shriner D, Vaughan L, Padilla M, Tiwari H: Problems with genome-wide association studies. Science 2007, 316(5833):1840c.
  • [11]Eichler E, Flint J, Gibson G, Kong A, Leal S, Moore J, Nadeau J: Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 2010, 11(6):446-450.
  • [12]McKinney B, Reif D, Ritchie M, Moore J: Machine learning for detecting gene-gene interactions: a review. Appl Bioinform 2006, 5(2):77-88.
  • [13]Cordell H: Detecting gene–gene interactions that underlie human diseases. Nat Rev Genet 2009, 10(6):392-404.
  • [14]Moore J, Asselbergs F, Williams S: Bioinformatics challenges for genome-wide association studies. Bioinformatics 2010, 26(4):445.
  • [15]Neuman R, Rice J: Two-locus models of disease. Genet Epidemiol 1992, 9(5):347-365.
  • [16]Li W, Reich J: A complete enumeration and classification of two-locus disease models. Hum Hered 2000, 50(6):334-349.
  • [17]Brodie III E: Why evolutionary genetics does not always add up. Epistasis Evol Process. New York: Oxford University Press; 2000:3-19.
  • [18]Culverhouse R, Suarez B, Lin J, Reich T: A perspective on epistasis: limits of models displaying no main effect. Am J Hum Genet 2002, 70(2):461-471.
  • [19]Urbanowicz RJ, Kiralis J, Sinnott-Armstrong NA, Heberling T, Fisher JM, Moore JH: GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min 2012, 5:1-14.
  • [20]Urbanowicz RJ, Kiralis J, Fisher JM, Moore JH: Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection. BioData Min 2012, 5:1-13.
  • [21]Hallgrímsdóttir IB, Yuster DS: A complete classification of epistatic two-locus models. BMC Genetics 2008, 9:17.
  • [22]Moore J, Hahn L, Ritchie M, Thornton T, White B: Application of genetic algorithms to the discovery of complex models for simulation studies in human genetics. In Proceedings of the Genetic and Evolutionary Computation Conference. NIH Public Access; 2002:1155-1155.
  • [23]Moore J, Hahn L, Ritchie M, Thornton T, White B: Routine discovery of complex genetic models using genetic algorithms. Appl Soft Comput 2004, 4:79-86.
  • [24]Greene C, Himmelstein D, Moore J: A model free method to generate human genetics datasets with complex gene-disease relationships. Evol Comput Mach Learn Data Min Bioinformatics 2010, 6023:74-85.
  • [25]Beerenwinkel N, Pachter L, Sturmfels B: Epistasis and shapes of fitness landscapes. Stat Sinica 2007, 17:1317-1342.
  • [26]Barber C, Huhdanpaa H: Qhull, Softwarepackage. 1995.
  • [27]Rambau J: TOPCOM: Triangulations of point configurations and oriented matroids. In Mathematical software: proceedings of the first International Congress of Mathematical Software: Beijing, China, 17-19 August 2002. Imperial College Pr; 2002:330-340.
  • [28]Kruskal W, Wallis W: Use of ranks in one-criterion variance analysis. J Am Stat Assoc 1952, 47(260):583-621.
  • [29]Hahn LW, Ritchie MD, Moore JH: Multifactor dimensionality reduction software for detecting gene–gene and gene–environment interactions. Bioinformatics 2003, 19(3):376-382.
  文献评价指标  
  下载次数:81次 浏览次数:6次