BMC Bioinformatics | |
Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients | |
Mark Gerstein1  Shaoke Lou1  Tianxiao Li1  Daniel Spakowicz2  Geoffrey Lowell Chupp3  Xiting Yan3  | |
[1] Program in Computational Biology and Bioinformatics, Yale University, 06520, New Haven, CT, USA;Department of Molecular Biophysics and Biochemistry, Yale University, 06520, New Haven, CT, USA;Program in Computational Biology and Bioinformatics, Yale University, 06520, New Haven, CT, USA;Department of Molecular Biophysics and Biochemistry, Yale University, 06520, New Haven, CT, USA;Division of Medical Oncology, The Ohio State University, 43210, Columbus, OH, USA;Pulmonary and Critical Care, Yale School of Medicine, 06520, New Haven, CT, USA; | |
关键词: Asthma; Asthma subtypes; Denoising autoencoder; Biomarker; Non-invasive; | |
DOI : 10.1186/s12859-020-03785-y | |
来源: Springer | |
【 摘 要 】
BackgroundThe pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed complex, noisy high-dimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data.ResultsHere, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which are associated with severity. Furthermore, we can use these key genes to successfully estimate FEV1/FVC ratios across patients, via support-vector-machine regression.ConclusionWe found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients.
【 授权许可】
CC BY
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202104271489492ZK.pdf | 1408KB | download |