期刊论文详细信息
BMC Bioinformatics
DNdisorder: predicting protein disorder using boosting and deep networks
Jesse Eickholt2  Jianlin Cheng1 
[1] C. Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA
[2] Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
关键词: Deep learning;    Deep networks;    Disordered regions;    Protein disorder prediction;   
Others  :  1087947
DOI  :  10.1186/1471-2105-14-88
 received in 2012-07-26, accepted in 2013-02-28,  发布年份 2013
PDF
【 摘 要 】

Background

A number of proteins contain regions which do not adopt a stable tertiary structure in their native state. Such regions known as disordered regions have been shown to participate in many vital cell functions and are increasingly being examined as drug targets.

Results

This work presents a new sequence based approach for the prediction of protein disorder. The method uses boosted ensembles of deep networks to make predictions and participated in the CASP10 experiment. In a 10 fold cross validation procedure on a dataset of 723 proteins, the method achieved an average balanced accuracy of 0.82 and an area under the ROC curve of 0.90. These results are achieved in part by a boosting procedure which is able to steadily increase balanced accuracy and the area under the ROC curve over several rounds. The method also compared competitively when evaluated against a number of state-of-the-art disorder predictors on CASP9 and CASP10 benchmark datasets.

Conclusions

DNdisorder is available as a web service at http://iris.rnet.missouri.edu/dndisorder/ webcite.

【 授权许可】

   
2013 Eickholt and Cheng; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117061107135.pdf 471KB PDF download
Figure 6. 40KB Image download
Figure 5. 52KB Image download
Figure 4. 52KB Image download
Figure 3. 39KB Image download
Figure 2. 27KB Image download
Figure 1. 43KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]Monastyrskyy B, Fidelis K, Moult J, Tramontano A, Kryshtafovych A: Evaluation of disorder predictions in CASP9. Proteins 2011, 79(Suppl 10):107-118.
  • [2]He B, Wang K, Liu Y, Xue B, Uversky VN, Dunker AK: Predicting intrinsic disorder in proteins: an overview. Cell Res 2009, 19:929-949.
  • [3]Obradovic Z, Peng K, Vucetic S, Radivojac P, Dunker AK: Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins 2005, 61(Suppl 7):176-182.
  • [4]Tompa P: Intrinsically unstructured proteins. Trends Biochem Sci 2002, 27:527-533.
  • [5]Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z: Intrinsic disorder and protein function. Biochemistry 2002, 41:6573-6582.
  • [6]Cheng Y, LeGall T, Oldfield CJ, Mueller JP, Van YY, Romero P, Cortese MS, Uversky VN, Dunker AK: Rational drug design via intrinsically disordered protein. Trends Biotechnol 2006, 24:435-442.
  • [7]Dunker AK, Uversky VN: Drugs for ‘protein clouds’: targeting intrinsically disordered transcription factors. Curr Opin Pharmacol 2010, 10:782-788.
  • [8]Orosz F, Ovadi J: Proteins without 3D structure: definition, detection and beyond. Bioinformatics 2011, 27:1449-1454.
  • [9]Deng X, Eickholt J, Cheng J: A comprehensive overview of computational protein disorder prediction methods. Mol Biosyst 2012, 8:114-121.
  • [10]Uversky VN, Gillespie JR, Fink AL: Why are “natively unfolded” proteins unstructured under physiologic conditions? Proteins 2000, 41:415-427.
  • [11]Dosztanyi Z, Csizmok V, Tompa P, Simon I: IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 2005, 21:3433-3434.
  • [12]Dosztanyi Z, Csizmok V, Tompa P, Simon I: The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J Mol Biol 2005, 347:827-839.
  • [13]Uversky VN: Natively unfolded proteins: a point where biology waits for physics. Protein Sci 2002, 11:739-756.
  • [14]Walsh I, Martin AJ, Di Domenico T, Tosatto SC: ESpritz: accurate and fast prediction of protein disorder. Bioinformatics 2012, 28:503-509.
  • [15]Ishida T, Kinoshita K: PrDOS: prediction of disordered protein regions from amino acid sequence. Nucleic Acids Res 2007, 35:W460-464.
  • [16]Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT: Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol 2004, 337:635-645.
  • [17]Schlessinger A, Punta M, Rost B: Natively unstructured regions in proteins identified from contact predictions. Bioinformatics 2007, 23:2376-2384.
  • [18]Galzitskaya OV, Garbuzynskiy SO, Lobanov MY: FoldUnfold: web server for the prediction of disordered regions in protein chain. Bioinformatics 2006, 22:2948-2949.
  • [19]Noivirt-Brik O, Prilusky J, Sussman JL: Assessment of disorder predictions in CASP8. Proteins 2009, 77(Suppl 9):210-216.
  • [20]Hecker J, Yang JY, Cheng J: Protein disorder prediction at multiple levels of sensitivity and specificity. BMC Genomics 2008, 9(Suppl 1):S9. BioMed Central Full Text
  • [21]Cheng J, Sweredoski MJ, Baldi P: Accurate prediction of protein disordered regions by mining protein structure data. Data Min Knowl Discov 2005, 11:213-222.
  • [22]Deng X, Eickholt J, Cheng J: PreDisorder: ab initio sequence-based prediction of protein disordered regions. BMC Bioinforma 2009, 10:436. BioMed Central Full Text
  • [23]CASP Data Archive. [http://predictioncenter.org/download_area/ webcite]
  • [24]Disorder723. [http://casp.rnet.missouri.edu/download/disorder.dataset webcite]
  • [25]Hinton GE: To recognize shapes, first learn to generate images. Progress In Brain Research 2007, 165:535-547.
  • [26]Hinton G, Deng L, Yu D, Dahl GE, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 2012, 29:82-97.
  • [27]Hinton G, Salakhutdinov R: Discovering binary codes for documents by learning deep generative models. Top Cogn Sci 2011, 3:74-91.
  • [28]Eickholt J, Cheng J: Predicting protein residue-residue contacts using deep networks and boosting. Bioinformatics 2012, 28:3066-3072.
  • [29]Hinton GE, Osindero S, Teh Y-W: A fast learning algorithm for deep belief nets. Neural Comput 2006, 18:1527-1554.
  • [30]Hinton GE, Salakhutdinov RR: Reducing the dimensionality of data with neural networks. Science 2006, 313:504-507.
  • [31]A practical guide to training restricted Boltzmann machines. http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf webcite
  • [32]Hinton GE: Training products of experts by minimizing contrastive divergence. Neural Comput 2002, 14:30p.
  • [33]Smolensky P: Information processing in dynamical systems: foundations of harmony theory. In Parallel distributed processing: explorations in the microstructure of cognition, vol 1. MIT Press; 1986:194-281.
  • [34]Cudamat: A CUDA-based matrix class for Python. http://code.google.com/p/cudamat/ webcite
  • [35]Freund Y, Schapire RE: A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 1997, 55:119-139.
  • [36]Vezhnevets A, Barinova O: Avoiding Boosting Overfitting by Removing Confusing Samples. In Book Avoiding Boosting Overfitting by Removing Confusing Samples. City: Springer; 2007:430-441.
  • [37]Cheng J, Randall AZ, Sweredoski MJ, Baldi P: SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res 2005, 33:W72-76.
  • [38]Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389-3402.
  • [39]Atchley WR, Zhao J, Fernandes AD, Druke T: Solving the protein sequence metric problem. Proc Natl Acad Sci USA 2005, 102:6395-6400.
  • [40]Hanley JA, McNeil BJ: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143:29-36.
  • [41]Kozlowski LP, Bujnicki JM: MetaDisorder: a meta-server for the prediction of intrinsic disorder in proteins. BMC Bioinforma 2012, 13:111. BioMed Central Full Text
  • [42]Walsh I, Martin AJ, Di Domenico T, Vullo A, Pollastri G, Tosatto SC: CSpritz: accurate prediction of protein disorder segments with annotation for homology, secondary structure and linear motifs. Nucleic Acids Res 2011, 39:W190-196.
  • [43]Kinch LN, Shi S, Cheng H, Cong Q, Pei J, Mariani V, Schwede T, Grishin NV: CASP9 target classification. Proteins 2011, 79(Suppl 10):21-36.
  • [44]Tress ML, Ezkurdia I, Richardson JS: Target domain definition and classification in CASP8. Proteins 2009, 77(Suppl 9):10-17.
  • [45]Rice P, Longden I, Bleasby A: EMBOSS: the european molecular biology open software suite. Trends Genet 2000, 16:276-277.
  文献评价指标  
  下载次数:127次 浏览次数:11次