期刊论文详细信息
BMC Bioinformatics
CLAP: A web-server for automatic classification of proteins with special reference to multi-domain proteins
Mutharasu Gnanavel4  Prachi Mehrotra2  Ramaswamy Rakshambikai1  Juliette Martin5  Narayanaswamy Srinivasan1  Ramachandra M Bhaskara3 
[1] Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India
[2] IISc Mathematics Initiative, Indian Institute of Science, Bangalore 560012, India
[3] Present address: Department of Theoretical Biophysics, Max-Planck Institute of Biophysics, Max-von-Laue-Straβe 3, D-60438 Frankfurt am Main, Germany
[4] Present address: Molecular Signaling Lab, Signal Processing Department, Tampere University of Technology, Tampere, Finland
[5] Bases Moléculaires et Structurales des Systèmes Infectieux, CNRS, UMR 5086; Université Lyon 1; IBCP, 7 passage du Vercors, F-69367 Lyon Cedex 07, France
关键词: Protein classification;    Multi-domain proteins;    Domain architectures;    Alignment-free comparison;   
Others  :  1085525
DOI  :  10.1186/1471-2105-15-343
 received in 2014-03-26, accepted in 2014-09-30,  发布年份 2014
PDF
【 摘 要 】

Background

The function of a protein can be deciphered with higher accuracy from its structure than from its amino acid sequence. Due to the huge gap in the available protein sequence and structural space, tools that can generate functionally homogeneous clusters using only the sequence information, hold great importance. For this, traditional alignment-based tools work well in most cases and clustering is performed on the basis of sequence similarity. But, in the case of multi-domain proteins, the alignment quality might be poor due to varied lengths of the proteins, domain shuffling or circular permutations. Multi-domain proteins are ubiquitous in nature, hence alignment-free tools, which overcome the shortcomings of alignment-based protein comparison methods, are required. Further, existing tools classify proteins using only domain-level information and hence miss out on the information encoded in the tethered regions or accessory domains. Our method, on the other hand, takes into account the full-length sequence of a protein, consolidating the complete sequence information to understand a given protein better.

Results

Our web-server, CLAP (Classification of Proteins), is one such alignment-free software for automatic classification of protein sequences. It utilizes a pattern-matching algorithm that assigns local matching scores (LMS) to residues that are a part of the matched patterns between two sequences being compared. CLAP works on full-length sequences and does not require prior domain definitions.

Pilot studies undertaken previously on protein kinases and immunoglobulins have shown that CLAP yields clusters, which have high functional and domain architectural similarity. Moreover, parsing at a statistically determined cut-off resulted in clusters that corroborated with the sub-family level classification of that particular domain family.

Conclusions

CLAP is a useful protein-clustering tool, independent of domain assignment, domain order, sequence length and domain diversity. Our method can be used for any set of protein sequences, yielding functionally relevant clusters with high domain architectural homogeneity. The CLAP web server is freely available for academic use at http://nslab.mbu.iisc.ernet.in/clap/ webcite.

【 授权许可】

   
2014 Gnanavel et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150113174208976.pdf 535KB PDF download
Figure 2. 38KB Image download
Figure 1. 59KB Image download
【 图 表 】

Figure 1.

Figure 2.

【 参考文献 】
  • [1]Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG: SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res 2014, 42:D310-D314.
  • [2]Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR, Bateman A, Finn RD: The Pfam protein families database. Nucleic Acids Res 2014, 42:D222-D230.
  • [3]Ekman D, Bjorklund AK, Frey-Skott J, Elofsson A: Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. J Mol Biol 2005, 348:231-243.
  • [4]Forslund K, Sonnhammer EL: Predicting protein function from domain content. Bioinformatics 2008, 24:1681-1687.
  • [5]Itoh M, Nacher JC, Kuma K, Goto S, Kanehisa M: Evolutionary history and functional implications of protein domains and their combinations in eukaryotes. Genome Biol 2007, 8:R121. BioMed Central Full Text
  • [6]Kummerfeld SK, Teichmann SA: Protein domain organisation: adding order. BMC Bioinformatics 2009, 10:39. BioMed Central Full Text
  • [7]Pearson WR, Sierk ML: The limits of protein sequence comparison? Curr Opin Struct Biol 2005, 15(3):254-260.
  • [8]Schwende I, Pham TD: Pattern recognition and probabilistic measures in alignment-free sequence analysis. Brief Bioinform 2014, 15(3):354-368.
  • [9]Vinga S, Almeida J: Alignment-free sequence comparison-a review. Bioinformatics 2003, 19(4):513-523.
  • [10]Kelil A, Wang S, Brzezinski R, Fleury A: CLUSS: clustering of protein sequences based on a new similarity measure. BMC Bioinformatics 2007, 8:286. BioMed Central Full Text
  • [11]Martin J, Anamika K, Srinivasan N: Classification of protein kinases on the basis of both kinase and non-kinase regions. PLoS One 2010, 5(9):e12460.
  • [12]Bhaskara RM, Mehrotra P, Rakshambikai R, Gnanavel M, Martin J, Srinivasan N: The relationship between classification of multi-domain proteins using an alignment-free approach and their functions: a case study with Immunoglobulins. Mol Biosyst 2014, 10:1082-1093.
  • [13]Ward JH: Hierarchial grouping to optimize an objective function. J Am Stat Assoc 1963, 58(301):236-244.
  • [14]R Development Core Team: R: A Language and Environment for Statistical Computing. In R Foundation for Statistical Computing. Vienna, Austria; 2008. ISBN 3-900051-07-0. http://www.R-project.org webcite
  • [15]Levandowsky M, Winter D: Distance between sets. Nature 1971, 234:34-35.
  • [16]Goodman LA, Kruskal WH: Measures of association for cross classifications. J Am Stat Assoc 1954, 49:732-764.
  • [17]Lin K, Zhu L, Zhang DY: An initial strategy for comparing proteins at the domain architecture level. Bioinformatics 2006, 22(17):2081-2086.
  • [18]Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: Clustal W and Clustal X version 2.0. Bioinformatics 2007, 23:2947-2948.
  • [19]Huang Y, Niu B, Gao Y, Fu L, Li W: CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 2010, 26:680.
  • [20]The UniProt Consortium: Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 2012, 40:D71-D75.
  • [21]Sul SJ, Williams TL: A Randomized Algorithm for Comparing Sets of Phylogenetic Trees. Proceedings of the Asia-Pacific Bioinformatics Conference 2007 2007, 121-130.
  文献评价指标  
  下载次数:22次 浏览次数:45次