期刊论文详细信息
BMC Structural Biology
Designing and benchmarking the MULTICOM protein structure prediction system
Jianlin Cheng1  Jesse Eickholt2  Xin Deng2  Jilong Li2 
[1]C. Bond Life Science Center, University of Missouri, Columbia, MO, USA
[2]Computer Science Department, University of Missouri, Columbia, MO, USA
关键词: Model refinement;    Model combination;    Model assessment;    Model generation;    Template combination;    Template identification;    Protein structure prediction;   
Others  :  1091518
DOI  :  10.1186/1472-6807-13-2
 received in 2012-10-16, accepted in 2013-02-21,  发布年份 2013
PDF
【 摘 要 】

Background

Predicting protein structure from sequence is one of the most significant and challenging problems in bioinformatics. Numerous bioinformatics techniques and tools have been developed to tackle almost every aspect of protein structure prediction ranging from structural feature prediction, template identification and query-template alignment to structure sampling, model quality assessment, and model refinement. How to synergistically select, integrate and improve the strengths of the complementary techniques at each prediction stage and build a high-performance system is becoming a critical issue for constructing a successful, competitive protein structure predictor.

Results

Over the past several years, we have constructed a standalone protein structure prediction system MULTICOM that combines multiple sources of information and complementary methods at all five stages of the protein structure prediction process including template identification, template combination, model generation, model assessment, and model refinement. The system was blindly tested during the ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) in 2010 and yielded very good performance. In addition to studying the overall performance on the CASP9 benchmark, we thoroughly investigated the performance and contributions of each component at each stage of prediction.

Conclusions

Our comprehensive and comparative study not only provides useful and practical insights about how to select, improve, and integrate complementary methods to build a cutting-edge protein structure prediction system but also identifies a few new sources of information that may help improve the design of a protein structure prediction system. Several components used in the MULTICOM system are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/ webcite.

【 授权许可】

   
2013 Li et al; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150128172532521.pdf 2263KB PDF download
Figure 7. 46KB Image download
Figure 6. 63KB Image download
Figure 5. 65KB Image download
Figure 4. 68KB Image download
Figure 3. 70KB Image download
Figure 2. 101KB Image download
Figure 1. 87KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

【 参考文献 】
  • [1]Eisenhaber F: Protein structure prediction: recognition of primary, secondary, and tertiary structural features from amino acid sequence. Crit Rev Biochem Mol Biol 1995, 30:1-94.
  • [2]Rost B: Protein structure prediction in 1D, 2D, and 3D. Encyclopaedia Comput Chem 1998, 3:2242-2255.
  • [3]Floudas C: Computational methods in protein structure prediction. Biotechnol Bioeng 2007, 97:207-213.
  • [4]Shah M: A computational pipeline for protein structure prediction and analysis at genome scale. Bioinformatics 2003, 19:1985.
  • [5]Fox BG: Structural genomics: from genes to structures with valuable materials and many questions in between. Nat Methods 2008, 5:129-132.
  • [6]Lemer CMR: Protein structure prediction by threading methods: evaluation of current techniques. Proteins 1995, 23:337-355.
  • [7]Moult J: A large-scale experiment to assess protein structure prediction methods. Proteins 1995, 23:ii-iv.
  • [8]Rost B: Prediction of protein structure through evolution. Handbook Chemoinformatics 2003, 1789-1811.
  • [9]Wang Z: MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8. Bioinformatics 2010, 26:882-888.
  • [10]Bernstein FC: The protein data bank: a computer-based archival file for macromolecular structures*. J Mol Biol 1977, 112:535-542.
  • [11]Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22:2577-2637.
  • [12]Joosten RP: A series of PDB related databases for everyday needs. Nucleic Acids Res 2011, 39:D411-D419.
  • [13]Altschul SF: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389-3402.
  • [14]Soding J: The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 2005, 33:W244-W248.
  • [15]Sadreyev R, Grishin N: COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J Mol Biol 2003, 326:317-336.
  • [16]PRC, the profile comparer. Available: http://supfam.org/PRC/ webcite
  • [17]Hughey R, Krogh A: SAM: sequence alignment and modeling software system. Santa Cruz, CA, USA: University of California at Santa Cruz; 1995. [Technical Report: UCSC-CRL-95-07]
  • [18]Finn RD: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 2011, 39:W29-W37.
  • [19]Cheng J: SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res 2005, 33:W72-W76.
  • [20]McGuffin L: The PSIPRED protein structure prediction server. Bioinformatics 2000, 16:404.
  • [21]Altschul S: Basic local alignment search tool. J Mol Biol 1990, 215:403-410.
  • [22]Biegert A, Söding J: Sequence context-specific profiles for homology searching. Proc Natl Acad Sci 2009, 106:3770-3775.
  • [23]Cheng J: A multi-template combination algorithm for protein comparative modeling. BMC Struct Biol 2008, 8:18. BioMed Central Full Text
  • [24]Zhang Y, Skolnick J: TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 2005, 33:2302-2309.
  • [25]Zemla A: LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res 2003, 31:3370-3374.
  • [26]Zhou H, Zhou Y: SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 2005, 21:3615-3621.
  • [27]Fiser A, Sali A: Modeller: generation and refinement of homology-based protein structure models. Methods Enzymol 2003, 374:461-491.
  • [28]Cheng J: Recursive protein modeling: a divide and conquer strategy for protein structure prediction and its case study in CASP9. J Bioinform Comput Biol 2011, 10:3.
  • [29]Leaver-Fay A: ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol 2011, 487:545-574.
  • [30]Chen H, Kihara D: Estimating quality of template-based protein models by alignment stability. Proteins 2008, 71:1255-1274.
  • [31]PRSS3 - evaluates the significance of a protein sequence alignment. Available: http://www.ch.embnet.org/software/PRSS_form.html webcite
  • [32]Randall A, Baldi P: SELECTpro: effective protein model selection using a structure-based energy function resistant to BLUNDERs. BMC Struct Biol 2008, 8:52. BioMed Central Full Text
  • [33]Benkert P: QMEAN: a comprehensive scoring function for model quality assessment. Proteins 2008, 71:261-77.
  • [34]Zhou H, Skolnick J: Protein model quality assessment prediction by combining fragment comparisons and a consensus Cα contact potential. Proteins 2007, 71:1211-1218.
  • [35]Wang Z: Evaluating the absolute quality of a single protein model using structural features and support vector machines. Proteins 2009, 75:638-647.
  • [36]Chida AA: Protein Tertiary Model Assessment Using Granular Machine Learning Techniques. Comp Sci Dissert 2012., 65
  • [37]Dong Q: A machine learning-based method for protein global model quality assessment. Int J Gen 2011, 40:417-425.
  • [38]Wallner B, Elofsson A: Can correct protein models be identified? Protein Sci 2009, 12:1073-1086.
  • [39]Wang Q: MUFOLD-WQA: A new selective consensus method for quality assessment in protein structure prediction. Proteins 2011, 79:185-195.
  • [40]Adamczak R: Fast geometric consensus approach for protein model quality assessment. J Comput Biol 2011, 18:1807-1818.
  • [41]Ginalski K: 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics 2003, 19:1015-1018.
  • [42]Wallner B, Elofsson A: Pcons5: combining consensus, structural evaluation and fold recognition scores. Bioinformatics 2005, 21:4248-4254.
  • [43]Cortes C, Vapnik V: Support-vector networks. Mach Learn 1995, 20:273-297.
  • [44]Zhang Y, Skolnick J: Scoring function for automated assessment of protein structure template quality. Proteins 2004, 57:702-710.
  • [45]Wang Z: APOLLO: a quality assessment service for single and multiple protein models. Bioinformatics 2011, 27:1715-1716.
  • [46]Cheng J: Prediction of global and local quality of CASP8 models by MULTICOM series. Proteins 2009, 77:181-184.
  • [47]Tegge AN: NNcon: improved protein contact map prediction using 2D-recursive neural networks. Nucleic Acids Res 2009, 37:W515-W518.
  • [48]Cheng J, Baldi P: Three-stage prediction of protein β-sheets by neural networks, alignments and graph algorithms. Bioinformatics 2005, 21:i75-i84.
  • [49]Kinch LN: CASP9 target classification. Proteins 2011, 79:21-36.
  • [50]Deng X: PreDisorder: ab initio sequence-based prediction of protein disordered regions. BMC Bioinforma 2009, 10:436. BioMed Central Full Text
  文献评价指标  
  下载次数:53次 浏览次数:21次