期刊论文详细信息
BMC Bioinformatics
Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent
Noah Simon1  Pål Olof Westermark2  Dörte Wittenburg2  Jan Klosa2  Volkmar Liebscher3 
[1] Department of Biostatistics, University of Washington, 98195, Seattle, WA, USA;Institute of Genetics and Biometry, Leibniz Institute for Farm Animal Biology, 18196, Dummerstorf, Germany;Institute of Mathematics and Computer Science, University of Greifswald, 17489, Greifswald, Germany;
关键词: Optimization;    Machine learning;    High-dimensional data;    R package;   
DOI  :  10.1186/s12859-020-03725-w
来源: Springer
PDF
【 摘 要 】

BackgroundStatistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of explanatory variables exceeds the number of observations or for some biological reason. Then, the model goodness of fit is penalized by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realized by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, seagull -the R package presented here- produces complete regularization paths.ResultsPublicly available high-dimensional methylation data are used to compare seagull to the established R package SGL. The results of both packages enabled a precise prediction of biological age from DNA methylation status. But even though the results of seagull and SGL were very similar (R2 > 0.99), seagull computed the solution in a fraction of the time needed by SGL. Additionally, seagull enables the incorporation of weights for each penalized feature.ConclusionsThe following operators for linear regression models are available in seagull: lasso, group lasso, sparse-group lasso and Integrative LASSO with Penalty Factors (IPF-lasso). Thus, seagull is a convenient envelope of lasso variants.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202104245958130ZK.pdf 821KB PDF download
  文献评价指标  
  下载次数:6次 浏览次数:3次