• 已选条件:
  • × Xin Wang
  • × BMC Bioinformatics
 全选  【符合条件的数据共:5条】

BMC Bioinformatics,2012年

Xin Wang, Mariza de Andrade, Stacey J Winham, Colin L Colby, Robert R Freimuth, Joanna M Biernacka, Marianne Huebner

LicenseType:CC BY |

预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

BackgroundIdentifying variants associated with complex human traits in high-dimensional data is a central goal of genome-wide association studies. However, complicated etiologies such as gene-gene interactions are ignored by the univariate analysis usually applied in these studies. Random Forests (RF) are a popular data-mining technique that can accommodate a large number of predictor variables and allow for complex models with interactions. RF analysis produces measures of variable importance that can be used to rank the predictor variables. Thus, single nucleotide polymorphism (SNP) analysis using RFs is gaining popularity as a potential filter approach that considers interactions in high-dimensional data. However, the impact of data dimensionality on the power of RF to identify interactions has not been thoroughly explored. We investigate the ability of rankings from variable importance measures to detect gene-gene interaction effects and their potential effectiveness as filters compared to p-values from univariate logistic regression, particularly as the data becomes increasingly high-dimensional.ResultsRF effectively identifies interactions in low dimensional data. As the total number of predictor variables increases, probability of detection declines more rapidly for interacting SNPs than for non-interacting SNPs, indicating that in high-dimensional data the RF variable importance measures are capturing marginal effects rather than capturing the effects of interactions.ConclusionsWhile RF remains a promising data-mining technique that extends univariate methods to condition on multiple variables simultaneously, RF variable importance measures fail to detect interaction effects in high-dimensional data in the absence of a strong marginal component, and therefore may not be useful as a filter technique that allows for interaction effects in genome-wide data.

    BMC Bioinformatics,2022年

    Jie Li, Yadong Liu, Guohua Wang, Xin Wang

    LicenseType:CC BY |

    预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

    BMC Bioinformatics,2012年

    Joanna M Biernacka, Marianne Huebner, Mariza de Andrade, Xin Wang, Robert R Freimuth, Colin L Colby, Stacey J Winham

    英文

    预览  |  原文链接  |  全文  [ 浏览:1 下载:0  ]    

    Background

    Identifying variants associated with complex human traits in high-dimensional data is a central goal of genome-wide association studies. However, complicated etiologies such as gene-gene interactions are ignored by the univariate analysis usually applied in these studies. Random Forests (RF) are a popular data-mining technique that can accommodate a large number of predictor variables and allow for complex models with interactions. RF analysis produces measures of variable importance that can be used to rank the predictor variables. Thus, single nucleotide polymorphism (SNP) analysis using RFs is gaining popularity as a potential filter approach that considers interactions in high-dimensional data. However, the impact of data dimensionality on the power of RF to identify interactions has not been thoroughly explored. We investigate the ability of rankings from variable importance measures to detect gene-gene interaction effects and their potential effectiveness as filters compared to p-values from univariate logistic regression, particularly as the data becomes increasingly high-dimensional.

    Results

    RF effectively identifies interactions in low dimensional data. As the total number of predictor variables increases, probability of detection declines more rapidly for interacting SNPs than for non-interacting SNPs, indicating that in high-dimensional data the RF variable importance measures are capturing marginal effects rather than capturing the effects of interactions.

    Conclusions

    While RF remains a promising data-mining technique that extends univariate methods to condition on multiple variables simultaneously, RF variable importance measures fail to detect interaction effects in high-dimensional data in the absence of a strong marginal component, and therefore may not be useful as a filter technique that allows for interaction effects in genome-wide data.

      BMC Bioinformatics,2012年

      Joanna M Biernacka, Marianne Huebner, Mariza de Andrade, Xin Wang, Robert R Freimuth, Colin L Colby, Stacey J Winham

      英文

      预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

      Background

      Identifying variants associated with complex human traits in high-dimensional data is a central goal of genome-wide association studies. However, complicated etiologies such as gene-gene interactions are ignored by the univariate analysis usually applied in these studies. Random Forests (RF) are a popular data-mining technique that can accommodate a large number of predictor variables and allow for complex models with interactions. RF analysis produces measures of variable importance that can be used to rank the predictor variables. Thus, single nucleotide polymorphism (SNP) analysis using RFs is gaining popularity as a potential filter approach that considers interactions in high-dimensional data. However, the impact of data dimensionality on the power of RF to identify interactions has not been thoroughly explored. We investigate the ability of rankings from variable importance measures to detect gene-gene interaction effects and their potential effectiveness as filters compared to p-values from univariate logistic regression, particularly as the data becomes increasingly high-dimensional.

      Results

      RF effectively identifies interactions in low dimensional data. As the total number of predictor variables increases, probability of detection declines more rapidly for interacting SNPs than for non-interacting SNPs, indicating that in high-dimensional data the RF variable importance measures are capturing marginal effects rather than capturing the effects of interactions.

      Conclusions

      While RF remains a promising data-mining technique that extends univariate methods to condition on multiple variables simultaneously, RF variable importance measures fail to detect interaction effects in high-dimensional data in the absence of a strong marginal component, and therefore may not be useful as a filter technique that allows for interaction effects in genome-wide data.

        BMC Bioinformatics,2012年

        Xin Wang, Mariza de Andrade, Stacey J Winham, Colin L Colby, Robert R Freimuth, Joanna M Biernacka, Marianne Huebner

        LicenseType:CC BY |

        预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

        BackgroundIdentifying variants associated with complex human traits in high-dimensional data is a central goal of genome-wide association studies. However, complicated etiologies such as gene-gene interactions are ignored by the univariate analysis usually applied in these studies. Random Forests (RF) are a popular data-mining technique that can accommodate a large number of predictor variables and allow for complex models with interactions. RF analysis produces measures of variable importance that can be used to rank the predictor variables. Thus, single nucleotide polymorphism (SNP) analysis using RFs is gaining popularity as a potential filter approach that considers interactions in high-dimensional data. However, the impact of data dimensionality on the power of RF to identify interactions has not been thoroughly explored. We investigate the ability of rankings from variable importance measures to detect gene-gene interaction effects and their potential effectiveness as filters compared to p-values from univariate logistic regression, particularly as the data becomes increasingly high-dimensional.ResultsRF effectively identifies interactions in low dimensional data. As the total number of predictor variables increases, probability of detection declines more rapidly for interacting SNPs than for non-interacting SNPs, indicating that in high-dimensional data the RF variable importance measures are capturing marginal effects rather than capturing the effects of interactions.ConclusionsWhile RF remains a promising data-mining technique that extends univariate methods to condition on multiple variables simultaneously, RF variable importance measures fail to detect interaction effects in high-dimensional data in the absence of a strong marginal component, and therefore may not be useful as a filter technique that allows for interaction effects in genome-wide data.