期刊论文详细信息
BMC Bioinformatics
Gdaphen, R pipeline to identify the most important qualitative and quantitative predictor variables from phenotypic data
Software
Claire Gavériaux-Ruff1  Maria del Mar Muñiz Moreno2  Yann Herault3 
[1] Université de Strasbourg, CNRS UMR7104, INSERM U1258, Institut de Génétique, Biologie Moléculaire Et Cellulaire (IGBMC), 1 Rue Laurent Fries, 67404, Illkirch Graffenstaden, France;Université de Strasbourg, CNRS UMR7104, INSERM U1258, Institut de Génétique, Biologie Moléculaire Et Cellulaire (IGBMC), 1 Rue Laurent Fries, 67404, Illkirch Graffenstaden, France;John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, 33136, Miami, FL, USA;Université de Strasbourg, CNRS UMR7104, INSERM U1258, Institut de Génétique, Biologie Moléculaire Et Cellulaire (IGBMC), 1 Rue Laurent Fries, 67404, Illkirch Graffenstaden, France;Université de Strasbourg, CNRS, INSERM, CELPHEDIA, PHENOMIN-Institut Clinique de La Souris (ICS), 1 Rue Laurent Fries, 67404, Illkirch Graffenstaden, France;
关键词: R package;    Phenotypic data;    Clinical data;    Discrimination;    Generalized linear models;    Random forest;    Imputation;    Model;    Prediction;    Machine learning;    Bootstrapping;   
DOI  :  10.1186/s12859-022-05111-0
 received in 2022-06-30, accepted in 2022-12-12,  发布年份 2022
来源: Springer
PDF
【 摘 要 】

BackgroundIn individuals or animals suffering from genetic or acquired diseases, it is important to identify which clinical or phenotypic variables can be used to discriminate between disease and non-disease states, the response to treatments or sexual dimorphism. However, the data often suffers from low number of samples, high number of variables or unbalanced experimental designs. Moreover, several parameters can be recorded in the same test. Thus, correlations should be assessed, and a more complex statistical framework is necessary for the analysis. Packages already exist that provide analysis tools, but they are not found together, rendering the decision method and implementation difficult for non-statisticians.ResultWe present Gdaphen, a fast joint-pipeline allowing the identification of most important qualitative and quantitative predictor variables to discriminate between genotypes, treatments, or sex. Gdaphen takes as input behavioral/clinical data and uses a Multiple Factor Analysis (MFA) to deal with groups of variables recorded from the same individuals or anonymize genotype-based recordings. Gdaphen uses as optimized input the non-correlated variables with 30% correlation or higher on the MFA-Principal Component Analysis (PCA), increasing the discriminative power and the classifier’s predictive model efficiency. Gdaphen can determine the strongest variables that predict gene dosage effects thanks to the General Linear Model (GLM)-based classifiers or determine the most discriminative not linear distributed variables thanks to Random Forest (RF) implementation. Moreover, Gdaphen provides the efficacy of each classifier and several visualization options to fully understand and support the results as easily readable plots ready to be included in publications. We demonstrate Gdaphen capabilities on several datasets and provide easily followable vignettes.ConclusionsGdaphen makes the analysis of phenotypic data much easier for medical or preclinical behavioral researchers, providing an integrated framework to perform: (1) pre-processing steps as data imputation or anonymization; (2) a full statistical assessment to identify which variables are the most important discriminators; and (3) state of the art visualizations ready for publication to support the conclusions of the analyses. Gdaphen is open-source and freely available at https://github.com/munizmom/gdaphen, together with vignettes, documentation for the functions and examples to guide you in each own implementation.

【 授权许可】

CC BY   
© The Author(s) 2023

【 预 览 】
附件列表
Files Size Format View
RO202305111764991ZK.pdf 3017KB PDF download
41116_2022_35_Article_IEq350.gif 1KB Image download
41116_2022_35_Article_IEq412.gif 1KB Image download
41116_2022_35_Article_IEq434.gif 1KB Image download
41116_2022_35_Article_IEq465.gif 1KB Image download
41116_2022_35_Article_IEq473.gif 1KB Image download
Fig. 1 214KB Image download
41116_2022_35_Article_IEq577.gif 1KB Image download
【 图 表 】

41116_2022_35_Article_IEq577.gif

Fig. 1

41116_2022_35_Article_IEq473.gif

41116_2022_35_Article_IEq465.gif

41116_2022_35_Article_IEq434.gif

41116_2022_35_Article_IEq412.gif

41116_2022_35_Article_IEq350.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  文献评价指标  
  下载次数:16次 浏览次数:8次