In this study, we propose a new method for compound identification using penalized linear regression. Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. In the context of the linear regression, the response variable is an experimental mass spectrum (i.e., query) and all the compounds in the reference library are the independent variables. However, the number of compounds in the reference library is much larger than the range of m/z values so that the data become high dimensional data with suffering from singularity. For this reason, we use penalized linear regression such as ridge regression and the Lasso. Furthermore, we also propose two-step approaches using dot product and Pearson’s correlation along with the penalized linear regression in this study.
【 预 览 】
附件列表
Files
Size
Format
View
Compound identification using penalized linear regression.