期刊论文详细信息
BMC Bioinformatics
Quality control of imbalanced mass spectra from isotopic labeling experiments
Min Gan1  Tianjun Li2  Long Chen2 
[1] College of Mathematics and Computer Science, Fuzhou University;Department of Computer and Information Science, University of Macau;
关键词: Mass Spectra;    Proteomics;    Imbalanced Data;    Quality Control;    Gradient Boosting;   
DOI  :  10.1186/s12859-019-3170-1
来源: DOAJ
【 摘 要 】

Abstract Background Mass spectra are usually acquired from the Liquid Chromatography-Mass Spectrometry (LC-MS) analysis for isotope labeled proteomics experiments. In such experiments, the mass profiles of labeled (heavy) and unlabeled (light) peptide pairs are represented by isotope clusters (2D or 3D) that provide valuable information about the studied biological samples in different conditions. The core task of quality control in quantitative LC-MS experiment is to filter out low-quality peptides with questionable profiles. The commonly used methods for this problem are the classification approaches. However, the data imbalance problems in previous control methods are often ignored or mishandled. In this study, we introduced a quality control framework based on the extreme gradient boosting machine (XGBoost), and carefully addressed the imbalanced data problem in this framework. Results In the XGBoost based framework, we suggest the application of the Synthetic minority over-sampling technique (SMOTE) to re-balance data and use the balanced data to train the boosted trees as the classifier. Then the classifier is applied to other data for the peptide quality assessment. Experimental results show that our proposed framework increases the reliability of peptide heavy-light ratio estimation significantly. Conclusions Our results indicate that this framework is a powerful method for the peptide quality assessment. For the feature extraction part, the extracted ion chromatogram (XIC) based features contribute to the peptide quality assessment. To solve the imbalanced data problem, SMOTE brings a much better classification performance. Finally, the XGBoost is capable for the peptide quality control. Overall, our proposed framework provides reliable results for the further proteomics studies.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:2次