| BMC Bioinformatics | |
| Amplitude spectrum distance: measuring the global shape divergence of protein fragments | |
| Clovis Galiez1  François Coste1  | |
| [1] Inria Rennes - Bretagne Atlantique, Rennes, France | |
| 关键词: Insertions and deletions; Pseudometric; Fourier transform; Structural comparison; Protein; | |
| Others : 1229846 DOI : 10.1186/s12859-015-0693-y |
|
| received in 2015-03-05, accepted in 2015-07-31, 发布年份 2015 | |
【 摘 要 】
Background
In structural bioinformatics, there is an increasing interest in identifying and understanding the evolution of local protein structures regarded as key structural or functional protein building blocks. A central need is then to compare these, possibly short, fragments by measuring efficiently and accurately their (dis)similarity. Progress towards this goal has given rise to scores enabling to assess the strong similarity of fragments. Yet, there is still a lack of more progressive scores, with meaningful intermediate values, for the comparison, retrieval or clustering of distantly related fragments.
Results
We introduce here the Amplitude Spectrum Distance (ASD), a novel way of comparing protein fragments based on the discrete Fourier transform of their C α distance matrix. Defined as the distance between their amplitude spectra, ASD can be computed efficiently and provides a parameter-free measure of the global shape dissimilarity of two fragments. ASD inherits from nice theoretical properties, making it tolerant to shifts, insertions, deletions, circular permutations or sequence reversals while satisfying the triangle inequality. The practical interest of ASD with respect to RMSD, RMSD d , BC and TM scores is illustrated through zinc finger retrieval experiments and concrete structure examples. The benefits of ASD are also illustrated by two additional clustering experiments: domain linkers fragments and complementarity-determining regions of antibodies.
Conclusions
Taking advantage of the Fourier transform to compare fragments at a global shape level, ASD is an objective and progressive measure taking into account the whole fragments. Its practical computation time and its properties make ASD particularly relevant for applications requiring meaningful measures on distantly related protein fragments, such as similar fragments retrieval asking for high recalls as shown in the experiments, or for any application taking also advantage of triangle inequality, such as fragments clustering.
ASD program and source code are freely available at: http://www.irisa.fr/dyliss/public/ASD/.
【 授权许可】
2015 Galiez and Coste.
| Files | Size | Format | View |
|---|---|---|---|
| Fig. 15. | 33KB | Image | |
| Fig. 14. | 97KB | Image | |
| Fig. 13. | 59KB | Image | |
| Fig. 12. | 75KB | Image | |
| Fig. 11. | 46KB | Image | |
| Fig. 10. | 19KB | Image | |
| Fig. 9. | 20KB | Image | |
| Fig. 8. | 18KB | Image | |
| Fig. 7. | 27KB | Image | |
| Fig. 6. | 23KB | Image | |
| Fig. 5. | 16KB | Image | |
| Fig. 4. | 33KB | Image | |
| Fig. 3. | 23KB | Image | |
| Fig. 2. | 11KB | Image | |
| Fig. 1. | 14KB | Image | |
| Fig. 15. | 33KB | Image | |
| Fig. 14. | 97KB | Image | |
| Fig. 13. | 59KB | Image | |
| Fig. 12. | 75KB | Image | |
| Fig. 11. | 46KB | Image | |
| Fig. 10. | 19KB | Image | |
| Fig. 9. | 20KB | Image | |
| Fig. 8. | 18KB | Image | |
| Fig. 7. | 27KB | Image | |
| Fig. 6. | 23KB | Image | |
| Fig. 5. | 16KB | Image | |
| Fig. 4. | 33KB | Image | |
| Fig. 3. | 23KB | Image | |
| Fig. 2. | 11KB | Image | |
| Fig. 1. | 14KB | Image |
【 图 表 】
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.
Fig. 12.
Fig. 13.
Fig. 14.
Fig. 15.
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.
Fig. 12.
Fig. 13.
Fig. 14.
Fig. 15.
【 参考文献 】
- [1]Friedberg I, Godzik A. Connecting the protein structure universe by using sparse recurring fragments. Structure. 2005; 13(8):1213-24.
- [2]Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (ce) of the optimal path. Protein Eng. 1998; 11(9):739-47.
- [3]Zhi D, Shatsky M, Brenner SE. Alignment-free local structural search by writhe decomposition. Bioinformatics. 2010; 26(9):1176-84.
- [4]Xuefeng C, Cheng LS, Lin H, Ming L. Fingerprinting protein structures effectively and efficiently. Bioinformatics. 2013. doi:10.1093/bioinformatics/btt659.
- [5]Simons KT, Kooperberg C, Huang E, Baker D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions. J Mol Biol. 1997; 268(1):209-25.
- [6]Camproux AC, Gautier R, Tufféry P. A hidden Markov model derived structural alphabet for proteins. J Mol Biol. 2004; 339(3):591-605.
- [7]Etchebest C, Benros C, Hazout S. A structural alphabet for local protein structures: Improved prediction methods. Proteins: Struct Funct Bioinform. 2005; 59(4):810-27.
- [8]Li SC, Bu D, Gao X, Xu J, Li M. Designing succinct structural alphabets. Bioinformatics. 2008; 24(13):182-9.
- [9]Zhou H, Skolnick J. Protein model quality assessment prediction by combining fragment comparisons and a consensus c(alpha) contact potential. Proteins. 2008; 71:1211-8.
- [10]Hasegawa H, Holm L. Advances and pitfalls of protein structural alignment. Current Opinion Struct Biol. 2009; 19(3):341-8.
- [11]Eidhammer I, Jonassen I, Taylor WR. Structure comparison and structure patterns. J Comput Biol. 1999; 7:685-716.
- [12]Guyon F, Tufféry P. Fast protein fragment similarity scoring using a binet–cauchy kernel. Bioinformatics. 2013. doi:10.1093/bioinformatics/btt618.
- [13]Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins: Struct Funct Bioinform. 2004; 57(4):702-10.
- [14]Zhang Y, Skolnick J. Tm-align: A protein structure alignment algorithm based on tm-score. Nucleic Acids Res. 2005; 33:2302-9.
- [15]Holm L, Park J. Dalilite workbench for protein structure comparison. Bioinformatics. 2000; 16(6):566-7.
- [16]Wohlers I, Andonov R, Klau GW. Optimal DALI protein structure alignment. IEEE/ACM Trans Comput Biol Bioinform. 2012;20. RR-7915 RR-7915.
- [17]Ciaccia P, Patella M, Zezula P. Proceedings of the 23rd International Conference on Very Large Data Bases. VLDB ’97. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA; 1997.
- [18]Røgen P, Fain B. Automatic classification of protein structure by using Gauss integrals. Proc Nat Acad Sci. 2003; 100(1):119-24.
- [19]Koehl P. Protein structure similarities. Current Opinion Struct Biol. 2001; 11(3):348-53.
- [20]Holm L, Park J. Dalilite workbench for protein structure comparison. Bioinformatics (Oxford, England). 2000; 16(6):566-7.
- [21]Jain AK. Fundamentals of Digital Image Processing. Prentice-Hall, Inc., Upper Saddle River, NJ, USA; 1989.
- [22]Cooley J, Tukey J. An algorithm for the machine calculation of complex Fourier series. Math Comput. 1965; 19(90):297-301.
- [23]Guyon F, Tufféry P. Assessing 3D scores for protein structure fragment mining. Open Access Bioinforma. 2010; 2:67-77.
- [24]Minami S, Sawada K, Chikenji G. Mican : a protein structure alignment algorithm that can handle multiple-chains, inverse alignments, calpha only models, alternative alignments, and non-sequential alignments. BMC Bioinformatics. 2013; 14(1):24. BioMed Central Full Text
- [25]Makarova KS, Grishin NV. Thermolysin and mitochondrial processing peptidase: how far structure-functional convergence goes. Protein Sci. 1999; 8(11):2537-40.
- [26]George RA, Heringa J. An analysis of protein domain linkers: their classification and role in protein folding. Protein Eng. 2002; 15(11):871–9. doi:. . http://peds. [10.1093/protein/15.11.871] webciteoxfordjournals.org/content/15/11/871.full.pdf+html webcite
- [27]Lancia G, Carr R, Walenz B, Istrail S. 101 optimal pdb structure alignments: A branch-and-cut algorithm for the maximum contact map overlap problem. Proceedings of the Fifth Annual International Conference on Computational Biology. 2001:193–202.
- [28]Sigrist CJA, Castro ED, Cerutti L, Cuche BA, Hulo N, Bridge A, Bougueleret L, Xenarios I. New and continuing developments at prosite. Nucleic Acids Res. 2013; 41(Database-Issue):344-7.
- [29]Chandonia J-MM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE. The ASTRAL Compendium in 2004,. Nucleic Acids Res. 2004; 32(Database issue):189-92.
- [30]Dunbar J, Krawczyk K, Leem J, Baker T, Fuchs A, Georges G, Shi J, Deane CM. Sabdab: the structural antibody database. Nucleic Acids Res. 2014; 42(D1):1140–1146. doi:. . http://nar. [10.1093/nar/gkt1043] webciteoxfordjournals.org/content/42/D1/D1140.full.pdf+html webcite
- [31]Davis J, Goadrich M. The relationship between precision-recall and roc curves. Proceedings of the 23rd International Conference on Machine Learning. ACM New York, NY, USA,; 2006.
- [32]North B, Lehmann A, Jr RLD. A new clustering of antibody {CDR} loop conformations. J Mol Biol. 2011; 406(2):228-56.
- [33]Davies DL, Bouldin DW. A cluster separation measure. IEEE Trans Pattern Anal Mach Intell. 1979; 1(2):224-7.