期刊论文详细信息
BioData Mining
A maximum flow-based network approach for identification of stable noncoding biomarkers associated with the multigenic neurological condition, autism
Brianna S. Chrisman1  Peter Y. Washington1  Jae-Yoon Jung2  Dennis P. Wall2  Kelley M. Paskov2  Min Woo Sun2  Maya Varma3  Nate T. Stockham4 
[1] Department of Bioengineering, Stanford University;Department of Biomedical Data Science, Stanford University;Department of Computer Science, Stanford University;Department of Neuroscience, Stanford University;
关键词: Maximum flow;    Network;    Feature selection;    Feature stability;    Linkage disequilibrium;    Machine learning;   
DOI  :  10.1186/s13040-021-00262-x
来源: DOAJ
【 摘 要 】

Abstract Background Machine learning approaches for predicting disease risk from high-dimensional whole genome sequence (WGS) data often result in unstable models that can be difficult to interpret, limiting the identification of putative sets of biomarkers. Here, we design and validate a graph-based methodology based on maximum flow, which leverages the presence of linkage disequilibrium (LD) to identify stable sets of variants associated with complex multigenic disorders. Results We apply our method to a previously published logistic regression model trained to identify variants in simple repeat sequences associated with autism spectrum disorder (ASD); this L 1-regularized model exhibits high predictive accuracy yet demonstrates great variability in the features selected from over 230,000 possible variants. In order to improve model stability, we extract the variants assigned non-zero weights in each of 5 cross-validation folds and then assemble the five sets of features into a flow network subject to LD constraints. The maximum flow formulation allowed us to identify 55 variants, which we show to be more stable than the features identified by the original classifier. Conclusion Our method allows for the creation of machine learning models that can identify predictive variants. Our results help pave the way towards biomarker-based diagnosis methods for complex genetic disorders.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:1次