BMC Bioinformatics | |
FISH Amyloid – a new method for finding amyloidogenic segments in proteins based on site specific co-occurence of aminoacids | |
Pawel Gasior1  Malgorzata Kotulska1  | |
[1] Institute of Biomedical Engineering and Instrumentation, Wroclaw University of Technology, 50-370 Wroclaw, Poland | |
关键词: Hot spot; Intramolecular contact sites; Amyloid; Machine learning; | |
Others : 1087614 DOI : 10.1186/1471-2105-15-54 |
|
received in 2013-03-27, accepted in 2014-02-03, 发布年份 2014 | |
【 摘 要 】
Background
Amyloids are proteins capable of forming fibrils whose intramolecular contact sites assume densely packed zipper pattern. Their oligomers can underlie serious diseases, e.g. Alzheimer’s and Parkinson’s diseases. Recent studies show that short segments of aminoacids can be responsible for amyloidogenic properties of a protein. A few hundreds of such peptides have been experimentally found but experimental testing of all candidates is currently not feasible. Here we propose an original machine learning method for classification of aminoacid sequences, based on discovering a segment with a discriminative pattern of site-specific co-occurrences between sequence elements. The pattern is based on the positions of residues with correlated occurrence over a sliding window of a specified length. The algorithm first recognizes the most relevant training segment in each positive training instance. Then the classification is based on maximal distances between co-occurrence matrix of the relevant segments in positive training sequences and the matrix from negative training segments. The method was applied for studying sequences of aminoacids with regard to their amyloidogenic properties.
Results
Our method was first trained on available datasets of hexapeptides with the amyloidogenic classification, using 5 or 6-residue sliding windows. Depending on the choice of training and testing datasets, the area under ROC curve obtained the value up to 0.80 for experimental, and 0.95 for computationally generated (with 3D profile method) datasets. Importantly, the results on 5-residue segments were not significantly worse, although the classification required that algorithm first recognized the most relevant training segments. The dataset of long sequences, such as sup35 prion and a few other amyloid proteins, were applied to test the method and gave encouraging results. Our web tool FISH Amyloid was trained on all available experimental data 4-10 residues long, offers prediction of amyloidogenic segments in protein sequences.
Conclusions
We proposed a new original classification method which recognizes co-occurrence patterns in sequences. The method reveals characteristic classification pattern of the data and finds the segments where its scoring is the strongest, also in long training sequences. Applied to the problem of amyloidogenic segments recognition, it showed a good potential for classification problems in bioinformatics.
【 授权许可】
2014 Gasior and Kotulska; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150117023128326.pdf | 596KB | download | |
Figure 5. | 41KB | Image | download |
Figure 4. | 25KB | Image | download |
Figure 3. | 56KB | Image | download |
Figure 2. | 53KB | Image | download |
Figure 1. | 81KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
【 参考文献 】
- [1]Jaroniec CP, MacPhee CE, Bajaj VS, McMahon MT, Dobson CM, Griffin RG: High-resolution molecular structure of a peptide inan amyloid fibril determined by magic angle spinning NMR spectroscopy. Proc Natl Acad Sci U S A 2004, 101:711-716.
- [2]Makin OS, Atkins E, Sikorski P, Johansson J, Serpell LC: Molecular basis for amyloid fibril formation and stability. Proc Natl Acad Sci U S A 2005, 102:315-320.
- [3]Nelson R, Sawaya MR, Balbirnie M, Madsen AO, Riekel C, Grothe R, Eisenberg D: Structure of the cross- beta spine of amyloid-like fibrils. Nature 2005, 435:773-778.
- [4]Sawaya MR, Sambashivan S, Nelson R, Ivanova MI, Sievers SA, Apostol MI, Thompson MJ, Balbirnie M, Wiltzius JJW, McFarlane HT, Madsen AØ, Riekel C, Eisenberg D: Atomic structures of amyloid cross β-spines reveal varied steric zippers. Nature 2007, 447:453-457.
- [5]Thompson MJ, Balbirnie M, Wiltzius JJW, McFarlane HT, Madsen AØ, Riekel C, Eisenberg D: Atomic structures of amyloid cross β-spines reveal varied steric zippers. Nature 2007, 447:453-457.
- [6]Uversky VN, Fink AL: Conformational constraints for amyloid fibrillation: the importance of being unfolded. Biochim Biophys Acta 2004, 1698:131-153.
- [7]Rousseau F, Schymkowitz J, Serrano L: Protein aggregation and amyloidosis: confusion of the kinds? Curr Opin Struct Biol 2006, 16:118-126.
- [8]Serrano L, de la Paz Lopez M: Sequence determinants of amyloid fibril formation. Proc Natl Acad Sci U S A 2004, 101:87-92.
- [9]Fernandez-Escamilla AM, Rousseau F, Schymkowitz J, Serrano L: Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat Biotechnol 2004, 22:1302-1306.
- [10]Thompson MJ, Sievers SA, Karanicolas J, Ivanova MI, Baker D, Eisenberg D: The 3D profile method for identifying fibril-forming segments of proteins. Proc Natl Acad Sci U S A 2006, 103:4074-4078.
- [11]Goldschmidt L, Tenga PK, Riek R, Eisenberg D: Identifying the amylome, proteins capable of forming amyloid-like fibrils. PNAS 2010, 107:3487-3492.
- [12]Galzitskaya OV, Garbuzynskiy SO, Lobanov MY: Prediction of amyloidogenic and disordered regions in protein chains. PLoS Comput Biol 2006, 2:e177.
- [13]Garbuzynskiy SO, Lobanov MY, Galzitskaya OV: FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence. Bioinformatics 2010, 26:326-332.
- [14]Trovato A, Chiti F, Maritan A, Seno F: Insight into the structure of amyloid fibrils from the analysis of globular proteins. PLoS Comput Biol 2006, 2:e170.
- [15]Trovato A, Seno F, Tosatto SC: The PASTA server for protein aggregation prediction. Protein Eng Des Sel 2007, 20:521-523.
- [16]Conchillo-Solé O, de Groot NS, Avilés FX, Vendrell J, Daura X, Ventura S: AGGRESCAN: a server for the prediction and evaluation of "hot spots" of aggregation in polypeptides. BMC Bioinformatics 2007, 8:65. BioMed Central Full Text
- [17]Zhang Z, Chen H, Lai L: Identification of amyloid fibril-forming segments based on structure and residue-based statistical potential. Bioinformatics 2007, 23:2218-2225.
- [18]Tartaglia GG, Vendruscolo M: The Zyggregator method for predicting protein aggregation propensities. Chem Soc Rev 2008, 37:1395-1401.
- [19]Tartaglia GG, Vendruscolo M: Proteome-level interplay between folding and aggregation propensities of proteins. J Mol Biol 2010, 402:919-928.
- [20]Kim C, Choi J, Lee SJ, Welsh WJ, Yoon S: NetCSSP: web application for predicting chameleon sequences and amyloid fibril formation. Nucleic Acids Res 2009, 37:W469-W473.
- [21]O'Donnell CW, Waldispühl J, Lis M, Halfmann R, Devadas S, Lindquist S, Berger B: A method for probing the mutational landscape of amyloid structure. Bioinformatics 2011, 27:i34-i42.
- [22]Bryan AW Jr, O'Donnell CW, Menke M, Cowen LJ, Lindquist S, Berger B: STITCHER: dynamic assembly of likely amyloid and prion β-structures from secondary structure predictions. Proteins 2011, 80:410-420.
- [23]Bryan AW Jr, Menke M, Cowen LJ, Lindquist SL, Berger B: BETASCAN: probable beta-amyloids identified by pairwise probabilistic analysis. PLoS Comput Biol 2009, 5:e1000333.
- [24]Frousios KK, Iconomidou VA, Karletidi CM, Hamodrakas SJ: Amyloidogenic determinants are usually not buried. BMC Struct Biol 2009, 9:44. BioMed Central Full Text
- [25]Stanislawski J, Kotulska M, Unold O: Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides. BMC Bioinformatics 2013, 14:21. BioMed Central Full Text
- [26]Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 2009, 11:10-18.
- [27]Maurer-Stroh S, Debulpaep M, Kuemmerer N, de la Paz Lopez M, Martins IC, Reumers J, Morris KL, Copland A, Serpell L, Serrano L, Schymkowitz JW, Rousseau F: Exploring the sequence determinants of amyloid structure using position-specific scoring matrices. Nat Methods 2010, 7:237-242.
- [28]David MP, Concepcion GP, Padlan EA: Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies. BMC Bioinformatics 2010, 11:79. BioMed Central Full Text
- [29]Kuhlman B, Baker D: Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci U S A 2000, 97:10383-10388.
- [30]server: http://services.mbi.ucla.edu/zipperdb/
- [31]server: http://bioinfo.protres.ru/fold-amyloid/amyloid_base.html
- [32]Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res 2004, 14:1188-1190.
- [33]server: http://waltz.switchlab.org/