卷:5 | |
Predicting the prevalence of complex genetic diseases from individual genotype profiles using capsule networks | |
Article | |
关键词: ASSOCIATION ANALYSES; RISK LOCI; ARCHITECTURE; VARIANTS; PROTEINS; IDENTIFY; TRAITS; COMMON; | |
DOI : 10.1038/s42256-022-00604-2 | |
来源: SCIE |
【 摘 要 】
Disease phenotypes can be predicted from genetic profiles, but diseases with complex, non-additive interactions between genes are hard to disentangle. An approach called DiseaseCapsule makes use of capsule networks to identify the hierarchical structure in genomic data and can predict complex diseases such as amyotrophic lateral sclerosis with high accuracy. Diseases that have a complex genetic architecture tend to suffer from considerable amounts of genetic variants that, although playing a role in the disease, have not yet been revealed as such. Two major causes for this phenomenon are genetic variants that do not stack up effects, but interact in complex ways; in addition, as recently suggested, the omnigenic model postulates that variants interact in a holistic manner to establish disease phenotypes. Here we present DiseaseCapsule, as a capsule-network-based approach that explicitly addresses to capture the hierarchical structure of the underlying genome data, and has the potential to fully capture the non-linear relationships between variants and disease. DiseaseCapsule is the first such approach to operate in a whole-genome manner when predicting disease occurrence from individual genotype profiles. In experiments, we evaluated DiseaseCapsule on amyotrophic lateral sclerosis (ALS) and Parkinson's disease, with a particular emphasis on ALS, which is known to have a complex genetic architecture and is affected by 40% missing heritability. On ALS, DiseaseCapsule achieves 86.9% accuracy on hold-out test data in predicting disease occurrence, thereby outperforming all other approaches by large margins. Also, DiseaseCapsule required sufficiently less training data for reaching optimal performance. Last but not least, the systematic exploitation of the network architecture yielded 922 genes of particular interest, and 644 'non-additive' genes that are crucial factors in DiseaseCapsule, but remain masked within linear schemes.
【 授权许可】
Free