BMC Bioinformatics | |
A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits | |
Research Article | |
Hongyu Zhao1  Herbert Pang2  Kang K. Yan2  | |
[1] Department of Biostatistics, Yale University, New Haven, CT, USA;School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China; | |
关键词: Bayesian network; Relevance vector machine; Graph-based semi-supervised learning; Semi-definite programming (SDP)-support vector machine; Multiple data sources; Classification; | |
DOI : 10.1186/s12859-017-1982-4 | |
received in 2017-06-05, accepted in 2017-11-26, 发布年份 2017 | |
来源: Springer | |
【 摘 要 】
BackgroundHigh-throughput sequencing data are widely collected and analyzed in the study of complex diseases in quest of improving human health. Well-studied algorithms mostly deal with single data source, and cannot fully utilize the potential of these multi-omics data sources. In order to provide a holistic understanding of human health and diseases, it is necessary to integrate multiple data sources. Several algorithms have been proposed so far, however, a comprehensive comparison of data integration algorithms for classification of binary traits is currently lacking.ResultsIn this paper, we focus on two common classes of integration algorithms, graph-based that depict relationships with subjects denoted by nodes and relationships denoted by edges, and kernel-based that can generate a classifier in feature space. Our paper provides a comprehensive comparison of their performance in terms of various measurements of classification accuracy and computation time. Seven different integration algorithms, including graph-based semi-supervised learning, graph sharpening integration, composite association network, Bayesian network, semi-definite programming-support vector machine (SDP-SVM), relevance vector machine (RVM) and Ada-boost relevance vector machine are compared and evaluated with hypertension and two cancer data sets in our study.In general, kernel-based algorithms create more complex models and require longer computation time, but they tend to perform better than graph-based algorithms. The performance of graph-based algorithms has the advantage of being faster computationally.ConclusionsThe empirical results demonstrate that composite association network, relevance vector machine, and Ada-boost RVM are the better performers. We provide recommendations on how to choose an appropriate algorithm for integrating data from multiple sources.
【 授权许可】
CC BY
© The Author(s). 2017
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311104847704ZK.pdf | 1246KB | download | |
MediaObjects/12894_2023_1313_MOESM4_ESM.xlsx | 14KB | Other | download |
Fig. 8 | 3631KB | Image | download |
MediaObjects/13046_2023_2865_MOESM6_ESM.tif | 2738KB | Other | download |
41512_2023_158_Article_IEq9.gif | 1KB | Image | download |
12951_2015_155_Article_IEq6.gif | 1KB | Image | download |
Fig. 6 | 488KB | Image | download |
Fig. 1 | 196KB | Image | download |
Fig. 6 | 601KB | Image | download |
Fig. 2 | 283KB | Image | download |
Fig. 2 | 650KB | Image | download |
Fig. 6 | 514KB | Image | download |
Fig. 8 | 2130KB | Image | download |
MediaObjects/12888_2023_5289_MOESM1_ESM.docx | 690KB | Other | download |
Fig. 1 | 224KB | Image | download |
41512_2023_158_Article_IEq20.gif | 1KB | Image | download |
Fig. 1 | 439KB | Image | download |
12951_2017_270_Article_IEq3.gif | 1KB | Image | download |
Fig. 2 | 786KB | Image | download |
Fig. 2 | 422KB | Image | download |
MediaObjects/13068_2023_2403_MOESM2_ESM.xls | 1986KB | Other | download |
【 图 表 】
Fig. 2
Fig. 2
12951_2017_270_Article_IEq3.gif
Fig. 1
41512_2023_158_Article_IEq20.gif
Fig. 1
Fig. 8
Fig. 6
Fig. 2
Fig. 2
Fig. 6
Fig. 1
Fig. 6
12951_2015_155_Article_IEq6.gif
41512_2023_158_Article_IEq9.gif
Fig. 8
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]