期刊论文详细信息
Statistical Analysis and Data Mining
Comparison of merging strategies for building machine learning models on multiple independent gene expression data sets
article
Jessica Krepel1  Magdalena Kircher1  Moritz Kohls1  Klaus Jung1 
[1] Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover
关键词: artificial neural networks;    data fusion;    discriminant analysis;    gene expression data;    high-dimensional data;    LASSO;    random forest;    support vector machines;   
DOI  :  10.1002/sam.11549
学科分类:社会科学、人文和艺术(综合)
来源: John Wiley & Sons, Inc.
PDF
【 摘 要 】

High-dimensional gene expression data are regularly studied for their ability to separate different groups of samples by means of machine learning (ML) models. Meanwhile, a large number of such data are publicly available. Several approaches for meta-analysis on independent sets of gene expression data have been proposed, mainly focusing on the step of feature selection, a typical step in fitting a ML model. Here, we compare different strategies of merging the information of such independent data sets to train a classifier model. Specifically, we compare the strategy of merging data sets directly (strategy A), and the strategy of merging the classification results (strategy B). We use simulations with pure artificial data as well as evaluations based on independent gene expression data from lung fibrosis studies to compare the two merging approaches. In the simulations, the number of studies, the strength of batch effects, and the separability are varied. The comparison incorporates five standard ML techniques typically used for high-dimensional data, namely discriminant analysis, support vector machines, least absolute shrinkage and selection operator, random forest, and artificial neural networks. Using cross-study validations, we found that direct data merging yields higher accuracies when having training data of three or four studies, and merging of classification results performed better when having only two training studies. In the evaluation with the lung fibrosis data, both strategies showed a similar performance.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202302050004634ZK.pdf 3851KB PDF download
  文献评价指标  
  下载次数:11次 浏览次数:5次