期刊论文详细信息
BMC Bioinformatics
A comparison of feature selection methodologies and learning algorithms in the development of a DNA methylation-based telomere length estimator
Research
David Corcoran1  Karen Sugden2  Ben Williams2  Richie Poulton3  Therese M. Murphy4  Trevor Doherty5  Sarah Jane Delany6  Terrie E. Moffitt7  Avshalom Caspi7  Jonathan Mill8  Emma Dempster8  Eilis Hannon8 
[1]Center for Genomic and Computational Biology, Duke University, 27708, Durham, NC, USA
[2]Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
[3]Department of Psychology, University of Otago, 9016, Dunedin, New Zealand
[4]School of Biological, Health and Sports Sciences, Technological University Dublin, Dublin, Ireland
[5]School of Biological, Health and Sports Sciences, Technological University Dublin, Dublin, Ireland
[6]SFI Centre for Research Training in Machine Learning, Technological University Dublin, Dublin, Ireland
[7]School of Computer Science, Technological University Dublin, Dublin, Ireland
[8]Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
[9]Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
[10]University of Exeter Medical School, University of Exeter, Exeter, UK
关键词: DNA Methylation;    Telomere Length;    Feature Selection;    Feature Reduction;    Machine Learning;    Aging;   
DOI  :  10.1186/s12859-023-05282-4
 received in 2022-03-29, accepted in 2023-04-11,  发布年份 2023
来源: Springer
PDF
【 摘 要 】
BackgroundThe field of epigenomics holds great promise in understanding and treating disease with advances in machine learning (ML) and artificial intelligence being vitally important in this pursuit. Increasingly, research now utilises DNA methylation measures at cytosine–guanine dinucleotides (CpG) to detect disease and estimate biological traits such as aging. Given the challenge of high dimensionality of DNA methylation data, feature-selection techniques are commonly employed to reduce dimensionality and identify the most important subset of features. In this study, our aim was to test and compare a range of feature-selection methods and ML algorithms in the development of a novel DNA methylation-based telomere length (TL) estimator. We utilised both nested cross-validation and two independent test sets for the comparisons.ResultsWe found that principal component analysis in advance of elastic net regression led to the overall best performing estimator when evaluated using a nested cross-validation analysis and two independent test cohorts. This approach achieved a correlation between estimated and actual TL of 0.295 (83.4% CI [0.201, 0.384]) on the EXTEND test data set. Contrastingly, the baseline model of elastic net regression with no prior feature reduction stage performed less well in general—suggesting a prior feature-selection stage may have important utility. A previously developed TL estimator, DNAmTL, achieved a correlation of 0.216 (83.4% CI [0.118, 0.310]) on the EXTEND data. Additionally, we observed that different DNA methylation-based TL estimators, which have few common CpGs, are associated with many of the same biological entities.ConclusionsThe variance in performance across tested approaches shows that estimators are sensitive to data set heterogeneity and the development of an optimal DNA methylation-based estimator should benefit from the robust methodological approach used in this study. Moreover, our methodology which utilises a range of feature-selection approaches and ML algorithms could be applied to other biological markers and disease phenotypes, to examine their relationship with DNA methylation and predictive value.
【 授权许可】

CC BY   
© The Author(s) 2023

【 预 览 】
附件列表
Files Size Format View
RO202308152725176ZK.pdf 3211KB PDF download
Fig. 2 1327KB Image download
41116_2023_36_Article_IEq801.gif 1KB Image download
Fig. 1 256KB Image download
40517_2023_258_Article_IEq128.gif 1KB Image download
Fig. 1 229KB Image download
12936_2023_4577_Article_IEq66.gif 1KB Image download
Fig. 1 252KB Image download
Fig. 2 104KB Image download
40517_2023_256_Article_IEq55.gif 1KB Image download
40517_2023_256_Article_IEq73.gif 1KB Image download
Fig. 9 118KB Image download
40517_2023_256_Article_IEq74.gif 1KB Image download
Fig. 1 462KB Image download
Fig. 10 117KB Image download
MediaObjects/12864_2023_9351_MOESM3_ESM.docx 95KB Other download
Fig. 3 249KB Image download
Fig. 2 1152KB Image download
Fig. 12 31KB Image download
40517_2023_256_Article_IEq81.gif 1KB Image download
【 图 表 】

40517_2023_256_Article_IEq81.gif

Fig. 12

Fig. 2

Fig. 3

Fig. 10

Fig. 1

40517_2023_256_Article_IEq74.gif

Fig. 9

40517_2023_256_Article_IEq73.gif

40517_2023_256_Article_IEq55.gif

Fig. 2

Fig. 1

12936_2023_4577_Article_IEq66.gif

Fig. 1

40517_2023_258_Article_IEq128.gif

Fig. 1

41116_2023_36_Article_IEq801.gif

Fig. 2

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  • [50]
  • [51]
  • [52]
  • [53]
  • [54]
  • [55]
  • [56]
  • [57]
  • [58]
  • [59]
  • [60]
  • [61]
  • [62]
  • [63]
  • [64]
  • [65]
  • [66]
  • [67]
  • [68]
  • [69]
  • [70]
  • [71]
  • [72]
  • [73]
  • [74]
  • [75]
  • [76]
  • [77]
  • [78]
  • [79]
  • [80]
  • [81]
  • [82]
  • [83]
  • [84]
  • [85]
  • [86]
  • [87]
  • [88]
  • [89]
  • [90]
  • [91]
  • [92]
  • [93]
  • [94]
  • [95]
  • [96]
  • [97]
  • [98]
  • [99]
  • [100]
  • [101]
  • [102]
  • [103]
  文献评价指标  
  下载次数:3次 浏览次数:0次