BMC Bioinformatics | |
Identifying structural domains of proteins using clustering | |
Howard J Feldman1  | |
[1] Chemical Computing Group, Inc., 1010 Sherbrooke St. W., Suite 910, Montreal, Quebec, H3A 2R7, Canada | |
关键词: Structural domain; Average-linkage; Agglomerative clustering; Domain assignment; | |
Others : 1088084 DOI : 10.1186/1471-2105-13-286 |
|
received in 2012-01-26, accepted in 2012-10-29, 发布年份 2012 | |
【 摘 要 】
Background
Protein structures are comprised of modular elements known as domains. These units are used and re-used over and over in nature, and usually serve some particular function in the structure. Thus it is useful to be able to break up a protein of interest into its component domains, prior to similarity searching for example. Numerous computational methods exist for doing so, but most operate only on a single protein chain and many are limited to making a series of cuts to the sequence, while domains can and do span multiple chains.
Results
This study presents a novel clustering-based approach to domain identification, which works equally well on individual chains or entire complexes. The method is simple and fast, taking only a few milliseconds to run, and works by clustering either vectors representing secondary structure elements, or buried alpha-carbon positions, using average-linkage clustering. Each resulting cluster corresponds to a domain of the structure. The method is competitive with others, achieving 70% agreement with SCOP on a large non-redundant data set, and 80% on a set more heavily weighted in multi-domain proteins on which both SCOP and CATH agree.
Conclusions
It is encouraging that a basic method such as this performs nearly as well or better than some far more complex approaches. This suggests that protein domains are indeed for the most part simply compact regions of structure with a higher density of buried contacts within themselves than between each other. By representing the structure as a set of points or vectors in space, it allows us to break free of any artificial limitations that other approaches may depend upon.
【 授权许可】
2012 Feldman; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150117073109806.pdf | 1721KB | download | |
Figure 8. | 106KB | Image | download |
Figure 7. | 54KB | Image | download |
Figure 6. | 80KB | Image | download |
Figure 5. | 141KB | Image | download |
Figure 4. | 82KB | Image | download |
Figure 3. | 74KB | Image | download |
Figure 2. | 85KB | Image | download |
Figure 1. | 88KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
【 参考文献 】
- [1]Wetlaufer DB: Nucleation, rapid folding, and globular intrachain regions in proteins. Proc Natl Acad Sci U S A 1973, 70(3):697-701.
- [2]Rossman MG, Liljas A: Letter: Recognition of structural domains in globular proteins. J Mol Biol 1974, 85(1):177-181.
- [3]Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, et al.: InterPro: the integrative protein signature database. Nucleic Acids Res 2009, 37(Database issue):D211-215.
- [4]Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247(4):536-540.
- [5]Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM: CATH–a hierarchic classification of protein domain structures. Structure 1997, 5(8):1093-1108.
- [6]Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215(3):403-410.
- [7]Finn RD, Clements J, Eddy SR: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 2011, 39(Web Server issue):W29-37.
- [8]Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, et al.: The Pfam protein families database. Nucleic Acids Res 2012, 40(Database issue):D290-301.
- [9]Letunic I, Doerks T, Bork P: SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res 2012, 40(Database issue):D302-305.
- [10]Holm L, Sander C: Parser for protein folding units. Proteins 1994, 19(3):256-268.
- [11]Alexandrov N, Shindyalov I: PDP: protein domain parser. Bioinformatics 2003, 19(3):429-430.
- [12]Zhou H, Xue B, Zhou Y: DDOMAIN: Dividing structures into domains using a normalized domain-domain interaction profile. Protein Sci 2007, 16(5):947-955.
- [13]Guo JT, Xu D, Kim D, Xu Y: Improving the performance of DomainParser for structural domain partition using neural network. Nucleic Acids Res 2003, 31(3):944-952.
- [14]Xu Y, Xu D, Gabow HN: Protein domain decomposition using a graph-theoretic approach. Bioinformatics 2000, 16(12):1091-1104.
- [15]Madej T, Addess KJ, Fong JH, Geer LY, Geer RC, Lanczycki CJ, Liu C, Lu S, Marchler-Bauer A, Panchenko AR, et al.: MMDB: 3D structures and macromolecular interactions. Nucleic Acids Res 2012, 40(Database issue):D461-464.
- [16]Gibrat JF, Madej T, Bryant SH: Surprising similarities in structure comparison. Curr Opin Struct Biol 1996, 6(3):377-385.
- [17]Emmert-Streib F, Mushegian A: A topological algorithm for identification of structural domains of proteins. BMC Bioinforma 2007, 8:237. BioMed Central Full Text
- [18]Alden K, Veretnik S, Bourne PE: dConsensus: a tool for displaying domain assignments by multiple structure-based algorithms and for construction of a consensus assignment. BMC Bioinforma 2010, 11:310. BioMed Central Full Text
- [19]Bennett MJ, Schlunegger MP, Eisenberg D: 3D domain swapping: a mechanism for oligomer assembly. Protein Sci 1995, 4(12):2455-2468.
- [20]Hakansson M, Linse S: Protein reconstitution and 3D domain swapping. Curr Protein Pept Sci 2002, 3(6):629-642.
- [21]Csaba G, Birzele F, Zimmer R: Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis. BMC Struct Biol 2009, 9:23. BioMed Central Full Text
- [22]Ohlendorf DH, Lipscomb JD, Weber PC: Structure and assembly of protocatechuate 3,4-dioxygenase. Nature 1988, 336(6197):403-405.
- [23]Holland TA, Veretnik S, Shindyalov IN, Bourne PE: Partitioning protein structures into domains: why is it so difficult? J Mol Biol 2006, 361(3):562-590.
- [24]Downs GM, Barnard JM: Clustering Methods and Their Uses in Computational Chemistry. In Reviews in Computational Chemistry. 18th edition. Edited by Lipkowitz KB, Boyd DB. John Wiley and Sons, Inc; 2002:1-40.
- [25]Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22(12):2577-2637.
- [26]Molecular Operating Environment. http://www.chemcomp.com/ webcite