期刊论文详细信息
Journal of computational biology
Deep Large-Scale Multitask Learning Network for Gene Expression Inference
article
Kamran Ghasedi Dizaji1  Wei Chen2  Heng Huang1 
[1] Department of Electrical and Computer Engineering, University of Pittsburgh;Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh;Department of Biomedical Informatics, School of Medicine, University of Pittsburgh
关键词: deep regression model;    gene expression inference;    landmark genes;    multitask learning;    target genes.;   
DOI  :  10.1089/cmb.2020.0438
来源: Mary Ann Liebert, Inc. Publishers
PDF
【 摘 要 】

Gene expression profiling makes it possible to conduct many biological studies in a variety of fields due to its thorough characterization of cellular states under various experimental conditions. Despite recent advances in high-throughput technology, profiling an entire set of genomes is still difficult and expensive. Due to the high correlation between expression patterns of different genes, the aforementioned problem can be solved with a cost-effective approach that collects only a small subset of genes, called landmark genes, representing the entire set of genes, and infer the remaining genes, called target genes, using a computational model. There are several shallow and deep regression models in literature to estimate the expressions of target genes from the landmark genes. However, the shallow mostly have limited capacity in learning the nonlinear and complex gene expression data and are prone to underfitting, and the deep models generally do not take advantage of correlation among target genes in the learning process and suffer from overfitting. Considering the gene expression inference as a multitask learning problem, we propose a new deep multitask learning algorithm to tackle these issues. Our learning framework automatically learns the correlation between target genes and uses this knowledge to improve its generalization. Specifically, we utilize a subnetwork with low-dimensional latent variables to discover the relationships between target genes and enforce a seamless and easy to implement regularization to our deep regression model. Unlike the existing multitask learning methods that can only deal with dozens or hundreds of tasks, our algorithm is able to efficiently learn the relationships between *10,000 target genes and, thus, is scalable to a large number of tasks. Our proposed method outperforms the shallow and deep regression models for gene expression inference and alternative multitask learning algorithms on two large-scale datasets regardless of the network architecture.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202108110003441ZK.pdf 920KB PDF download
  文献评价指标  
  下载次数:0次 浏览次数:0次