期刊论文详细信息
BMC Bioinformatics
Label noise in subtype discrimination of class C G protein-coupled receptors: A systematic approach to the analysis of classification errors
Research Article
Caroline König1  Alfredo Vellido2  Martha I Cárdenas3  René Alquézar4  Jesús Giraldo5 
[1] Dept. of Computer Science, Univ. Politècnica de Catalunya, C. Jordi Girona, 1-3, 08034, Barcelona, Spain;Dept. of Computer Science, Univ. Politècnica de Catalunya, C. Jordi Girona, 1-3, 08034, Barcelona, Spain;Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), 08193, Cerdanyola del Vallès, Barcelona, Spain;Dept. of Computer Science, Univ. Politècnica de Catalunya, C. Jordi Girona, 1-3, 08034, Barcelona, Spain;Institut de Neurociències, Unitat de Bioestadística, Univ. Autònoma de Barcelona, 08193, Cerdanyola del Vallès, Barcelona, Spain;Dept. of Computer Science, Univ. Politècnica de Catalunya, C. Jordi Girona, 1-3, 08034, Barcelona, Spain;Institut de Robòtica i Informàtica Industrial, CSIC-UPC, 08034, Barcelona, Spain;Institut de Neurociències, Unitat de Bioestadística, Univ. Autònoma de Barcelona, 08193, Cerdanyola del Vallès, Barcelona, Spain;
关键词: G Protein-coupled receptors;    Label noise;    Support vector machines;    Phylogenetic trees;   
DOI  :  10.1186/s12859-015-0731-9
 received in 2015-04-15, accepted in 2015-08-31,  发布年份 2015
来源: Springer
PDF
【 摘 要 】

BackgroundThe characterization of proteins in families and subfamilies, at different levels, entails the definition and use of class labels. When the adscription of a protein to a family is uncertain, or even wrong, this becomes an instance of what has come to be known as a label noise problem. Label noise has a potentially negative effect on any quantitative analysis of proteins that depends on label information. This study investigates class C of G protein-coupled receptors, which are cell membrane proteins of relevance both to biology in general and pharmacology in particular. Their supervised classification into different known subtypes, based on primary sequence data, is hampered by label noise. The latter may stem from a combination of expert knowledge limitations and the lack of a clear correspondence between labels that mostly reflect GPCR functionality and the different representations of the protein primary sequences.ResultsIn this study, we describe a systematic approach, using Support Vector Machine classifiers, to the analysis of G protein-coupled receptor misclassifications. As a proof of concept, this approach is used to assist the discovery of labeling quality problems in a curated, publicly accessible database of this type of proteins. We also investigate the extent to which physico-chemical transformations of the protein sequences reflect G protein-coupled receptor subtype labeling. The candidate mislabeled cases detected with this approach are externally validated with phylogenetic trees and against further trusted sources such as the National Center for Biotechnology Information, Universal Protein Resource, European Bioinformatics Institute and Ensembl Genome Browser information repositories.ConclusionsIn quantitative classification problems, class labels are often by default assumed to be correct. Label noise, though, is bound to be a pervasive problem in bioinformatics, where labels may be obtained indirectly through complex, many-step similarity modelling processes. In the case of G protein-coupled receptors, methods capable of singling out and characterizing those sequences with consistent misclassification behaviour are required to minimize this problem. A systematic, Support Vector Machine-based method has been proposed in this study for such purpose. The proposed method enables a filtering approach to the label noise problem and might become a support tool for database curators in proteomics.

【 授权许可】

CC BY   
© König et al. 2015

【 预 览 】
附件列表
Files Size Format View
RO202311102291653ZK.pdf 1933KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  • [50]
  • [51]
  • [52]
  • [53]
  • [54]
  • [55]
  • [56]
  • [57]
  • [58]
  • [59]
  • [60]
  文献评价指标  
  下载次数:5次 浏览次数:0次