| BMC Genomics | |
| A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data | |
| Research | |
| Mohammed H Alqahtani1  Dov J Stekel2  Anna L Swan2  Charlie Hodgman3  Ali Mobasheri4  Jaume Bacardit5  David Allaway6  | |
| [1] Center of Excellence in Genomic Medicine Research (CEGMR), King AbdulAziz University, 21589, Jeddah, Kingdom of Saudi Arabia;School of Biosciences, Faculty of Science, University of Nottingham, Sutton Bonington Campus, LE12 5RD, Leicestershire, United Kingdom;School of Biosciences, Faculty of Science, University of Nottingham, Sutton Bonington Campus, LE12 5RD, Leicestershire, United Kingdom;The D-BOARD European Consortium for Biomarker Discovery, The Universities of Surrey, Nottingham and Newcastle, United Kingdom;The D-BOARD European Consortium for Biomarker Discovery, The Universities of Surrey, Nottingham and Newcastle, United Kingdom;School of Veterinary Medicine, Faculty of Health and Medical Sciences, University of Surrey, Duke of Kent Building, GU2 7XH, Guildford, Surrey, United Kingdom;Center of Excellence in Genomic Medicine Research (CEGMR), King AbdulAziz University, 21589, Jeddah, Kingdom of Saudi Arabia;Arthritis Research UK Centre for Sport, Exercise, and Osteoarthritis, Arthritis Research UK Pain Centre, Medical Research Council-Arthritis Research UK Centre for Musculoskeletal Ageing Research, Faculty of Medicine and Health Sciences, University of Nottingham, University Park, NG7 2RD, Nottingham, United Kingdom;The D-BOARD European Consortium for Biomarker Discovery, The Universities of Surrey, Nottingham and Newcastle, United Kingdom;The Interdisciplinary Computing and Complex BioSystems (ICOS) research group, School of Computing Science, Newcastle University, Claremont Tower, NE1 7RU, Newcastle-upon-Tyne, United Kingdom;WALTHAM® Centre for Pet Nutrition, Waltham-on-the-Wolds, Melton Mowbray, LE14 4RT, Leicestershire, United Kingdom; | |
| 关键词: Support Vector Machine; Feature Selection; Classification Accuracy; Random Forest; Feature Selection Method; | |
| DOI : 10.1186/1471-2164-16-S1-S2 | |
| 来源: Springer | |
PDF
|
|
【 摘 要 】
BackgroundInvestigations into novel biomarkers using omics techniques generate large amounts of data. Due to their size and numbers of attributes, these data are suitable for analysis with machine learning methods. A key component of typical machine learning pipelines for omics data is feature selection, which is used to reduce the raw high-dimensional data into a tractable number of features. Feature selection needs to balance the objective of using as few features as possible, while maintaining high predictive power. This balance is crucial when the goal of data analysis is the identification of highly accurate but small panels of biomarkers with potential clinical utility. In this paper we propose a heuristic for the selection of very small feature subsets, via an iterative feature elimination process that is guided by rule-based machine learning, called RGIFE (Rule-guided Iterative Feature Elimination). We use this heuristic to identify putative biomarkers of osteoarthritis (OA), articular cartilage degradation and synovial inflammation, using both proteomic and transcriptomic datasets.Results and discussionOur RGIFE heuristic increased the classification accuracies achieved for all datasets when no feature selection is used, and performed well in a comparison with other feature selection methods. Using this method the datasets were reduced to a smaller number of genes or proteins, including those known to be relevant to OA, cartilage degradation and joint inflammation. The results have shown the RGIFE feature reduction method to be suitable for analysing both proteomic and transcriptomics data. Methods that generate large ‘omics’ datasets are increasingly being used in the area of rheumatology.ConclusionsFeature reduction methods are advantageous for the analysis of omics data in the field of rheumatology, as the applications of such techniques are likely to result in improvements in diagnosis, treatment and drug discovery.
【 授权许可】
Unknown
© Swan et al; licensee BioMed Central Ltd. 2015. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202311108103732ZK.pdf | 673KB |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
- [41]
- [42]
- [43]
- [44]
- [45]
- [46]
- [47]
- [48]
- [49]
- [50]
- [51]
- [52]
- [53]
- [54]
- [55]
- [56]
- [57]
- [58]
- [59]
- [60]
- [61]
- [62]
- [63]
- [64]
- [65]
- [66]
- [67]
- [68]
- [69]
- [70]
- [71]
- [72]
- [73]
- [74]
- [75]
- [76]
- [77]
- [78]
- [79]
- [80]
- [81]
- [82]
- [83]
- [84]
- [85]
- [86]
- [87]
- [88]
- [89]
- [90]
- [91]
- [92]
- [93]
- [94]
- [95]
- [96]
- [97]
- [98]
PDF