期刊论文详细信息
BMC Bioinformatics
FreeContact: fast and free software for protein contact prediction from residue co-evolution
László Kaján3  Thomas A Hopf1  Matúš Kalaš4  Debora S Marks1  Burkhard Rost2 
[1] Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA
[2] WZW – Weihenstephan, Alte Akademie 8, Freising, Germany
[3] Department for Bioinformatics and Computational Biology, TU Munich, Boltzmannstraße 3, Garching 85748, Germany
[4] Department of Informatics, University of Bergen, Bergen 5008, Norway
关键词: Debian package;    BioXSD;    mfDCA;    PSICOV;    EVcouplings;    EVfold;    Open-source software;    2D prediction;    Fast protein contact prediction;    Protein sequence analysis;    Protein structure prediction;   
Others  :  1087584
DOI  :  10.1186/1471-2105-15-85
 received in 2013-09-30, accepted in 2014-03-18,  发布年份 2014
PDF
【 摘 要 】

Background

20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications. On top, EVfold-mfDCA depends on proprietary software.

Results

Here, we present FreeContact, a fast, open source implementation of EVfold-mfDCA and PSICOV. On a test set of 140 proteins, FreeContact was almost eight times faster than PSICOV without decreasing prediction performance. The EVfold-mfDCA implementation of FreeContact was over 220 times faster than PSICOV with negligible performance decrease. EVfold-mfDCA was unavailable for testing due to its dependency on proprietary software. FreeContact is implemented as the free C++ library “libfreecontact”, complete with command line tool “freecontact”, as well as Perl and Python modules. All components are available as Debian packages. FreeContact supports the BioXSD format for interoperability.

Conclusions

FreeContact provides the opportunity to compute reliable contact predictions in any environment (desktop or cloud).

【 授权许可】

   
2014 Kaján et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117021643867.pdf 341KB PDF download
Figure 2. 37KB Image download
Figure 1. 77KB Image download
【 图 表 】

Figure 1.

Figure 2.

【 参考文献 】
  • [1]Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The protein data bank. Nucleic Acids Res 2000, 28(1):235-242.
  • [2]Magrane M, Consortium U: UniProt knowledgebase: a hub of integrated protein data. Database: the journal of biological databases and curation 2011, 2011:bar009.
  • [3]Rost B, Sander C: Bridging the protein sequence-structure gap by structure predictions. Annual review of biophysics and biomolecular structure 1996, 25:113-136.
  • [4]Kiefer F, Arnold K, Kunzli M, Bordoli L, Schwede T: The SWISS-MODEL repository and associated resources. Nucleic Acids Res 2009, 37(Database issue):D387-392.
  • [5]Pieper U, Webb BM, Barkan DT, Schneidman-Duhovny D, Schlessinger A, Braberg H, Yang Z, Meng EC, Pettersen EF, Huang CC, Datta RS, Sampathkumar P, Madhusudhan MS, Sjölander K, Ferrin TE, Burley SK, Sali A: ModBase, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res 2011, 39(Database issue):D465-474.
  • [6]Liu J, Hegyi H, Acton TB, Montelione GT, Rost B: Automatic target selection for structural genomics on eukaryotes. Proteins 2004, 56(2):188-200.
  • [7]Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS: Three-dimensional structures of membrane proteins from genomic sequencing. Cell 2012, 149(7):1607-1621.
  • [8]Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, Sander C: Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 2011, 6(12):e28766.
  • [9]EVcouplings and EVfold. http://evfold.org/evfold-web/evfold.do webcite
  • [10]Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M: Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A 2011, 108(49):E1293-1301.
  • [11]Jones DT, Buchan DW, Cozzetto D, Pontil M: PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 2012, 28(2):184-190.
  • [12]Ekeberg M, Lovkvist C, Lan Y, Weigt M, Aurell E: Improved contact prediction in proteins: using pseudolikelihoods to infer potts models. Physical review E, Statistical, nonlinear, and soft matter physics 2013, 87(1–1):012707.
  • [13]Skwark MJ, Abdel-Rehim A, Elofsson A: PconsC: combination of direct information methods and alignments improves contact prediction. Bioinformatics 2013, 29(14):1815-1816.
  • [14]Wang Z, Xu J: Predicting protein contact map using evolutionary and physical constraints by integer programming. Bioinformatics 2013, 29(13):i266-i273.
  • [15]Marks DS, Hopf TA, Sander C: Protein structure prediction from sequence variation. Nature biotechnology 2012, 30(11):1072-1080.
  • [16]de Juan D, Pazos F, Valencia A: Emerging methods in protein co-evolution. Nature reviews Genetics 2013, 14(4):249-261.
  • [17]Przybylski D, Rost B: Alignments grow, secondary structure prediction improves. Proteins 2002, 46(2):197-205.
  • [18]Kaján L, Yachdav G, Vicedo E, Steinegger M, Mirdita M, Angermüller C, Böhm A, Domke S, Ertl J, Mertes C, Reisinger E, Staniewski C, Rost B: Cloud prediction of protein structure and function with predict protein for debian. BioMed Research International 2013, 2013:6.
  • [19]Debian - the universal operating system. http://www.debian.org/ webcite
  • [20]Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 1992, 89(22):10915-10919.
  • [21]OpenMP ARB: OpenMP Application Program Interface V3.0. OpenMP Architecture Review Board; 2008.
  • [22]Anderson E, Bai Z, Bischof C, Blackford LS, Demmel J, Dongarra JJ, Croz JD, Hammarling S, Greenbaum A, McKenney A, Sorensen D: LAPACK Users’ guide. Volume 9. Siam; 1999.
  • [23]Mátyás A, Sustik BC: GLASSOFAST: an efficient GLASSO implementation. The University of Texas at Austin UTCS Technical Report 2012, 1-3. TR-12-29
  • [24]Friedman J, Hastie T, Tibshirani R: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2008, 9(3):432-441.
  • [25]Bash reference manual. http://www.gnu.org/software/bash/manual/bashref.html#Programmable-Completion webcite
  • [26]Clint Whaley R, Petitet A, Dongarra JJ: Automated empirical optimizations of software and the ATLAS project. Parallel Comput 2001, 27(1):3-35.
  • [27]Blackford LS, Demmel J, Dongarra J, Duff I, Hammarling S, Henry G, Heroux M, Kaufman L, Lumsdaine A, Petitet A, Pozo R, Remington K, Whaley RC: An updated set of basic linear algebra subprograms (BLAS). ACM Trans Math Softw 2002, 28(2):135-151.
  • [28]GNU General Public License http://www.gnu.org/licenses/gpl-3.0.html webcite
  • [29]Debian derivatives. http://wiki.debian.org/Derivatives webcite
  • [30]Field D, Tiwari B, Booth T, Houten S, Swan D, Bertrand N, Thurston M: Open software for biologists: from famine to feast. Nature biotechnology 2006, 24(7):801-803.
  • [31]Krampis K, Booth T, Chapman B, Tiwari B, Bicak M, Field D, Nelson KE: Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community. BMC Bioinforma 2012, 13:42.
  • [32]Moller S, Krabbenhoft HN, Tille A, Paleino D, Williams A, Wolstencroft K, Goble C, Holland R, Belhachemi D, Plessy C: Community-driven computational biology with Debian Linux. BMC Bioinforma 2010, 11(12):S5.
  • [33]Debian Med. http://www.debian.org/devel/debian-med/ webcite
  • [34]FreeContact FTP download. ftp://rostlab.org/free/
  • [35]Grana O, Baker D, MacCallum RM, Meiler J, Punta M, Rost B, Tress ML, Valencia A: CASP6 assessment of contact prediction. Proteins 2005, 61(Suppl 7):214-224.
  • [36]Kalaš M, Puntervoll P, Joseph A, Bartaševičiūtė E, Töpfer A, Venkataraman P, Pettifer S, Bryne JC, Ison J, Blanchet C: BioXSD: the common data-exchange format for everyday bioinformatics web services. Bioinformatics 2010, 26(18):i540-i546.
  文献评价指标  
  下载次数:53次 浏览次数:46次