期刊论文详细信息
GigaScience
Determination of nonlinear genetic architecture using compressed sensing
Stephen DH Hsu1  Chiu Man Ho1 
[1] Department of Physics and Astronomy, Michigan State University, 567 Wilson Road, East Lansing 48824, MI, USA
关键词: Nonlinear interactions;    Compressed sensing;    Genomics;   
Others  :  1224894
DOI  :  10.1186/s13742-015-0081-6
 received in 2014-09-24, accepted in 2015-08-21,  发布年份 2015
PDF
【 摘 要 】

Background

One of the fundamental problems of modern genomics is to extract the genetic architecture of a complex trait from a data set of individual genotypes and trait values. Establishing this important connection between genotype and phenotype is complicated by the large number of candidate genes, the potentially large number of causal loci, and the likely presence of some nonlinear interactions between different genes. Compressed Sensing methods obtain solutions to under-constrained systems of linear equations. These methods can be applied to the problem of determining the best model relating genotype to phenotype, and generally deliver better performance than simply regressing the phenotype against each genetic variant, one at a time. We introduce a Compressed Sensing method that can reconstruct nonlinear genetic models (i.e., including epistasis, or gene-gene interactions) from phenotype-genotype (GWAS) data. Our method uses L1-penalized regression applied to nonlinear functions of the sensing matrix.

Results

The computational and data resource requirements for our method are similar to those necessary for reconstruction of linear genetic models (or identification of gene-trait associations), assuming a condition of generalized sparsity, which limits the total number of gene-gene interactions. An example of a sparse nonlinear model is one in which a typical locus interacts with several or even many others, but only a small subset of all possible interactions exist. It seems plausible that most genetic architectures fall in this category. We give theoretical arguments suggesting that the method is nearly optimal in performance, and demonstrate its effectiveness on broad classes of nonlinear genetic models using simulated human genomes and the small amount of currently available real data. A phase transition (i.e., dramatic and qualitative change) in the behavior of the algorithm indicates when sufficient data is available for its successful application.

Conclusion

Our results indicate that predictive models for many complex traits, including a variety of human disease susceptibilities (e.g., with additive heritability h2 ∼0.5), can be extracted from data sets comprised of n ∼100s individuals, where s is the number of distinct causal variants influencing the trait. For example, given a trait controlled by ∼10 k loci, roughly a million individuals would be sufficient for application of the method.

【 授权许可】

   
2015 Ho and Hsu.

【 预 览 】
附件列表
Files Size Format View
20150915045936227.pdf 2542KB PDF download
Fig. 8. 31KB Image download
Fig. 7. 43KB Image download
Fig. 6. 40KB Image download
Fig. 5. 42KB Image download
Fig. 4. 46KB Image download
Fig. 3. 9KB Image download
Fig. 2. 12KB Image download
Fig. 1. 45KB Image download
【 图 表 】

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Fig. 8.

【 参考文献 】
  • [1]Hill W, Goddard M, Visscher P. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 2008; 4(2):1000008.
  • [2]Elad M. Sparse and redundant representations: from theory to applications in signal and image processing. Springer, New York, USA; 2010.
  • [3]Candès E. Compressive sampling. In: Proceedings of the International Congress of Mathematicians. Madrid, Spain: 2006. p. 1433–1452.
  • [4]Donoho DL. Compressed sensing. IEEE T Inform Theory. 2006; 52:1289.
  • [5]Foucart S, Rauhut H. A mathematical introduction to compressive sensing. Applied and Numerical Harmonic Analysis book series. Springer, New York, USA; 2013.
  • [6]Vattikuti S, Lee J, Chang C, Hsu S, Chow C. Applying compressed sensing to genome-wide association studies. GigaScience. 2014; 3:10-26. BioMed Central Full Text
  • [7]McKinney B et al.. Machine learning for detecting gene-gene interactions. Appl Bioinformatics. 2006; 5(1):77-88.
  • [8]Yi N. Statistical analysis of genetic interactions. Genet Res. 2010; 92(5-6):443-59.
  • [9]Park MY, Hastie T. Regularization path algorithms for detecting gene interactions. Department of Statistics, Stanford University, California, USA; 2006.
  • [10]Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B. 1996; 58:267-88.
  • [11]Zhao P, Yu B. On model selection consistency of lasso. J Mach Learn Res. 2006; 7:2541-63.
  • [12]Meinhausen N, Yu B. Lasso-type recovery of sparse representations for high-dimensional data. Ann Stat. 2009; 37(1):246-70.
  • [13]Donoho DL, Tanner J. Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Phil Trans R Soc. 2009; 367:4273-293.
  • [14]Donoho DL. High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension. Discrete Comput Geom. 2006; 35(4):617-52.
  • [15]Donoho DL, Tanner J. Neighborliness of randomly projected simplices in high dimensions. Proc Natl Acad Sci USA. 2005; 102(27):9452-7.
  • [16]Donoho DL, Tanner J. Sparse nonnegative solutions of underdetermined linear equations by linear programming. Proc Natl Acad Sci. 2005; 102(27):9446-51.
  • [17]Donoho DL, Tanner J. Counting faces of randomly-projected polytopes when the projection radically lowers dimension. J Am Math Soc. 2009; 22(1):1-53.
  • [18]Manichaikul A et al.. A model selection approach for the identification of quantitative trait loci in experimental crosses, allowing epistasis. Genetics. 2009; 181:1077-86.
  • [19]Lee S, Xing E. Leveraging input and output structures for joint mapping of epistatic and marginal eqtls. Bioinformatics. 2012; 28:137-46.
  • [20]Zhang X, Huang S, Zou F, Wang W. Team: Efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics. 2010; 26:217-27.
  • [21]Wan X et al.. Boost: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet. 2010; 87:325-40.
  • [22]Devlin B et al.. Analysis of multilocus models of association. Genet Epidemiol. 2003; 25:36-47.
  • [23]Wu J et al.. Screen and clean: a tool for identifying interactions in genome-wide association studies. Genet Epidemiol. 2010; 34(3):275-85.
  • [24]Wang Y et al.. An empirical comparison of several recent epistatic interaction detection methods. Bioinformatics. 2011; 27:2936-43.
  • [25]Hsu S. On the genetic architecture of intelligence and other quantitative traits. arXiv:1408.3421. (Figures 6 and 7 display heritability from twins studies).
  • [26]Yang J, Lee S, Goddard M, Visscher P. Gcta: A tool for genome-wide complex trait analysis. Am J Hum Genet. 2011; 88(1):76-82.
  • [27]Yang J et al.. Common snps explain a large proportion of the heritability for human height. Nat Genet. 2010; 42(7):565-9.
  • [28]Friedman J, Hastie T, Höfling H, Tibshirani R. Pathwise coordinate optimization. Ann Appl Stat. 2007; 1:302-32.
  • [29]Friedman J, Hastie T, Höfling H, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010; 33:1-22.
  • [30]Ho CM, Hsu SDH. Supporting materials for determination of nonlinear genetic architecture using compressed sensing. GigaScience Database: doi:http://dx.doi.org/10.5524/100162.
  文献评价指标  
  下载次数:167次 浏览次数:75次