期刊论文详细信息
BMC Research Notes
Support vector machine (SVM) based multiclass prediction with basic statistical analysis of plasminogen activators
Christophe Lefevre2  Munish Puri2  Selvaraj Muthukrishnan1 
[1] Institute of Microbial Technology, Sector-39A, Chandigarh, India;Centre for Chemistry and Biotechnology, Deakin University, Geelong, Victoria 3217, Australia
关键词: Support vector machine;    SVM;    Comparative analysis;    tPA;    UK;    SK;    SAK;    Tissue plasminogen activators;    Urokinase;    Staphylokinase;    Streptokinase;    Plasminogen activators;    Pg-activators;   
Others  :  1134667
DOI  :  10.1186/1756-0500-7-63
 received in 2013-09-23, accepted in 2014-01-16,  发布年份 2014
PDF
【 摘 要 】

Background

Plasminogen (Pg), the precursor of the proteolytic and fibrinolytic enzyme of blood, is converted to the active enzyme plasmin (Pm) by different plasminogen activators (tissue plasminogen activators and urokinase), including the bacterial activators streptokinase and staphylokinase, which activate Pg to Pm and thus are used clinically for thrombolysis. The identification of Pg-activators is therefore an important step in understanding their functional mechanism and derives new therapies.

Methods

In this study, different computational methods for predicting plasminogen activator peptide sequences with high accuracy were investigated, including support vector machines (SVM) based on amino acid (AC), dipeptide composition (DC), PSSM profile and Hybrid methods used to predict different Pg-activators from both prokaryotic and eukaryotic origins.

Results

Overall maximum accuracy, evaluated using the five-fold cross validation technique, was 88.37%, 84.32%, 87.61%, 85.63% in 0.87, 0.83,0.86 and 0.85 MCC with amino (AC) or dipeptide composition (DC), PSSM profile and Hybrid methods respectively. Through this study, we have found that the different subfamilies of Pg-activators are quite closely correlated in terms of amino, dipeptide, PSSM and Hybrid compositions. Therefore, our prediction results show that plasminogen activators are predictable with a high accuracy from their primary sequence. Prediction performance was also cross-checked by confusion matrix and ROC (Receiver operating characteristics) analysis. A web server to facilitate the prediction of Pg-activators from primary sequence data was implemented.

Conclusion

The results show that dipeptide, PSSM profile, and Hybrid based methods perform better than single amino acid composition (AC). Furthermore, we also have developed a web server, which predicts the Pg-activators and their classification (available online at http://mamsap.it.deakin.edu.au/plas_pred/home.html webcite). Our experimental results show that our approaches are faster and achieve generally a good prediction performance.

【 授权许可】

   
2014 Muthukrishnan et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150306032039868.pdf 2589KB PDF download
Figure 5. 73KB Image download
Figure 4. 146KB Image download
Figure 3. 110KB Image download
Figure 2. 83KB Image download
Figure 1. 49KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

【 参考文献 】
  • [1]Castellino FJ, Ploplis VA: Structure and function of the plasminogen/plasmin system. Thromb Haemost 2005, 93(4):647-654.
  • [2]Lijnen HR, Van Hoef B, Collen D: Interaction of staphylokinase with different molecular forms of plasminogen. Eur J Biochem 1993, 211(1–2):91-97.
  • [3]Lijnen HR, Van Hoef B, Collen D: Characterization of the murine plasminogen/urokinase-type plasminogen-activator system. Eur J Biochem 1996, 241(3):840-848.
  • [4]Banerjee A, Chisti Y, Banerjee UC: Streptokinase–a clinically useful thrombolytic agent. Biotechnol Adv 2004, 22(4):287-307.
  • [5]Belkin M, Belkin B, Bucknam CA, Straub JJ, Lowe R: Intra arterial fibrinolytic therapy, efficacy of streptokinase vsurokinase. Arch Surg 1986, 121(7):769-773.
  • [6]Ouriel K, Welch EL, Shortell CK, Geary K, Fiore WM, Cimino C: Comparison of streptokinase, urokinase and recombinant tissue plasminogen activator in an in vitro model of venous thrombosis. J Vasc Surg 1995, 22(5):593-597.
  • [7]Baruah DB, Dash RN, Chaudhari MR, Kadam SS: Plasminogen activators: a comparison. Vascul Pharmacol 2006, 44(1):1-9.
  • [8]Rajamohan G, Dikshit KL: Role of the N-terminal region of staphylokinase (SAK): evidence for the participation of the N-terminal region of SAK in the enzyme-substrate complex formation. FEBS Lett 2000, 474(2–3):151-158.
  • [9]Rajamohan G, Dahiya M, Mande SC, Dikshit KL: Function of the 90-loop (Thr90-Glu100) region of staphylokinase in plasminogen activation probed through site-directed mutagenesis and loop deletion. Biochem J 2002, 365:379-389.
  • [10]Karlin S, Ghandour G: Comparative statistics for DNA and protein sequences: multiple sequence analysis. Proc Natl Acad Sci U S A 1985, 82(18):6186-6190.
  • [11]Brendel V, Bucher P, Nourbakhsh IR, Blaisdell BE, Karlin S: Methods and algorithms for statistical analysis of protein sequences. Proc Natl Acad Sci U S A 1992, 15;89(6):2002-2006.
  • [12]Woo PC, Lau SK, Lam CS, Lai KK, Huang Y, Lee P, Luk GS, Dyrting KC, Chan KH, Yuen KY: Comparative analysis of complete genome sequences of three avian coronaviruses reveals a novel group 3c coronavirus. J Virol 2009, 83(2):908-917.
  • [13]Lata S, Bhasin M, Raghava GPS: MHCBN 4.0: a database of MHC/TAP binding peptides and T-cell epitopes. BMC Res Notes 2009, 2:61. BioMed Central Full Text
  • [14]Park D, Kim H, Chung K, Kim DS, Yun Y: Expression and characterization of a novel plasminogen activator from Agkistrodon halys venom. Toxicon 1998, 36(12):1807-1819.
  • [15]Liberatore GT, Samson A, Bladin C, Schleuning WD, Medcalf RL: Vampire bat salivary plasminogen activator (desmoteplase) a unique fibrinolytic enzyme that does not promote neurodegeneration. Stroke 2003, 34(2):537-543.
  • [16]Muthukrishnan S, Garg A, Raghava GP: Oxypred: prediction and classification of oxygen-binding proteins. Genomics Proteomics Bioinformatics 2007, 5(3–4):250-252.
  • [17]Joachims T: Making large scale SVM learning practical. In Advances in Kernel Methods - Support Vector Learning. Edited by Scholkopf B, Burges C, Smola A. Cambridge: MIT Press; 1999:169-184.
  • [18]Vapnik V: The Nature of Statistical Learning Theory. NewYork: Springer; 1995.
  • [19]Koonin EV, Tatusov RL, Rudd KE: Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications. Proc Natl Acad Sci U S A 1995, 92(25):11921-11925.
  • [20]Lefèvre C, Ikeda JE: A fast word search algorithm for the representation of sequence similarity in genomic DNA. Nucleic Acids Res 1994, 22(3):404-411.
  • [21]Garg A, Bhasin M, Raghava GP: Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J Biol Chem 2005, 280(15):14427-14432.
  • [22]Huang Y, Li Y: Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics 2004, 20(1):21-28.
  • [23]Fawcett T: An introduction to ROC analysis. Pattern Recogn Lett 2006, 27:861-874.
  • [24]Bradley AP: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 1997, 30(7):1145-1159.
  • [25]Hand DJ, Till RJ: A simple generalization of the area under the ROC curve to multiple class classification problems. Mach Learn 2001, 45(2):171-186.
  • [26]Kumar M, Gromiha MM, Raghava GPS: Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins 2007, 71:189-194.
  • [27]Rashid M, Saha S, Raghava GPS: Support vector machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs. BMC Bioinformatics 2007, 8:337. BioMed Central Full Text
  • [28]Kumar M, Verma R, Raghava GPS: Prediction of mitochondrial proteins using support vector machine and hidden markov model. J Biol Chem 2006, 281:5357-5363.
  • [29]Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22(13):1658-1659.
  • [30]Kumar R, Panwar B, Chauhan JS, Raghava GP: Analysis and prediction of cancerlectins using evolutionary and domain information. BMC Res Notes 2011, 20;4:237.
  文献评价指标  
  下载次数:114次 浏览次数:27次