期刊论文

期刊论文详细信息

BMC Bioinformatics
Capturing coevolutionary signals inrepeat proteins

Rocío Espada² R Gonzalo Parra¹ Thierry Mora³ Aleksandra M Walczak⁴ Diego U Ferreiro¹
[1] Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
[2] Departamento de Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
[3] Laboratoire de physique statistique, CNRS, UPMC and École normale supérieure, 24 rue Lhomond, Paris 75005, France
[4] 24 rue Lhomond, Paris 75005, France
关键词: Co-evolution; Direct information; Repeat proteins; Direct coupling analysis;
Others : 1231818 DOI : 10.1186/s12859-015-0648-3

received in 2015-03-17, accepted in 2015-06-16, 发布年份 2015

【摘要】

Background

The analysis of correlations of amino acid occurrences in globular domains has led to the development of statistical tools that can identify native contacts – portions of the chains that come to close distance in folded structural ensembles. Here we introduce a direct coupling analysis for repeat proteins – natural systems for which the identification of folding domains remains challenging.

Results

We show that the inherent translational symmetry of repeat protein sequences introduces a strong bias in the pair correlations at precisely the length scale of the repeat-unit. Equalizing for this bias in an objective way reveals true co-evolutionary signals from which local native contacts can be identified. Importantly, parameter values obtained for all other interactions are not significantly affected by the equalization. We quantify the robustness of the procedure and assign confidence levels to the interactions, identifying the minimum number of sequences needed to extract evolutionary information in several repeat protein families.

Conclusions

The overall procedure can be used to reconstruct the interactions at distances larger than repeat-pairs, identifying the characteristics of the strongest couplings in each family, and can be applied to any system that appears translationally symmetric.

【授权许可】

2015 Espada et al.

附件列表
Files	Size	Format	View
Fig. 5.	36KB	Image	download
Fig. 4.	32KB	Image	download
Fig. 3.	113KB	Image	download
Fig. 2.	34KB	Image	download
Fig. 1.	43KB	Image	download
Fig. 5.	36KB	Image	download
Fig. 4.	32KB	Image	download
Fig. 3.	113KB	Image	download
Fig. 2.	34KB	Image	download
Fig. 1.	43KB	Image	download

【图表】

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

【参考文献】

[1]Wetlaufer DB. Nucleation, rapid folding, and globular intrachain regions in proteins. Proc Natl Acad Sci USA. 1973; 70(3):697-701.
[2]Peisajovich SG, Tawfik DS. Protein engineers turned evolutionists. Nat Methods. 2007; 4(12):991-4.
[3]Jacob F. Evolution and tinkering. Science. 1977; 196(4295):1161-6.
[4]Onuchic JN, Luthey-Schulten Z, Wolynes PG. Theory of protein folding: the energy landscape perspective. Annu Rev Phys Chem. 1997; 48:545-600.
[5]Ferreiro DU, Hegler JA, Komives EA, Wolynes PG. Localizing frustration in native proteins and protein assemblies. Proc Natl Acad Sci USA. 2007; 104(50):19819-24.
[6]Parra RG, Espada R, Sánchez IE, Sippl MJ, Ferreiro DU. Detecting repetitions and periodicities in proteins by tiling the structural space. J Phys Chem B. 2013; 117(42):12887-97.
[7]Björklund Å. K., Ekman D, Elofsson A. Expansion of protein domain repeats. PLoS Comput Biol. 2006; 2(8):114.
[8]Kajava AV. Tandem repeats in proteins: from sequence to structure. J Struct Biol. 2012; 179(3):279-88.
[9]Tamaskovic R, Simon M, Stefan N, Schwill M, Plückthun A. Designed ankyrin repeat proteins (darpins) from research to therapy. Methods Enzymol. 2012; 503:101-34.
[10]Wolynes PG. Symmetry and the energy landscapes of biomolecules. Proc Natl Acad Sci U S A. 1996; 93(25):14249.
[11]Ferreiro DU, Walczak AM, Komives EA, Wolynes PG. The energy landscapes of repeat-containing proteins: topology, cooperativity, and the folding funnels of one-dimensional architectures. PLoS Comput Biol. 2008; 4(5):1000070.
[12]Schafer NP, Hoffman RM, Burger A, Craig PO, Komives EA, Wolynes PG. Discrete kinetic models from funneled energy landscape simulations. PloS One. 2012; 7(12):50635.
[13]Neher E. How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci. 1994; 91(1):98-102.
[14]Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in protein–protein interaction by message passing. Proc Natl Acad Sci. 2009; 106(1):67-72.
[15]Mora T, Walczak AM, Bialek W, Callan CG. Maximum entropy models for antibody diversity. Proc Natl Acad Sci. 2010; 107(12):5405-410.
[16]Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS. Three-dimensional structures of membrane proteins from genomic sequencing. Cell. 2012; 149(7):1607-21.
[17]Nugent T, Ward S, Jones DT. The mempack alpha-helical transmembrane protein structure prediction server. Bioinformatics. 2011; 27(10):1438-9.
[18]Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C et al.. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci. 2011; 108(49):1293-301.
[19]Morcos F, Hwa T, Onuchic JN, Weigt M. Direct coupling analysis for protein contact prediction. Methods Mol Biol. 2014; 1137:55-70.
[20]Brenner S. Net prophets. Curr Biol. 1998; 8(5):147.
[21]Sulkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN. Genomics-aided structure prediction. Proc Natl Acad Sci USA. 2012; 109(26):10340-5.
[22]Nugent T, Jones DT. Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proc Natl Acad Sci. 2012; 109(24):1540-7.
[23]Morcos F, Jana B, Hwa T, Onuchic JN. Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci USA. 2013; 110(51):20533-0538.
[24]Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R et al.. Protein 3d structure computed from evolutionary sequence variation. PloS one. 2011; 6(12):28766.
[25]Cheng RR, Morcos F, Levine H, Onuchic JN. Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc Natl Acad Sci USA. 2014; 111(5):563-71.
[26]Lui S, Tiana G. The network of stabilizing contacts in proteins studied by coevolutionary data. J Chem Phys. 2013; 139(15):155103.
[27]Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: using pseudolikelihoods to infer potts models. Phys Rev E. 2013; 87(1):012707.
[28]Balakrishnan S, Kamisetty H, Carbonell JG, Lee SI, Langmead CJ. Learning generative models for protein fold families. Proteins: Struct Function Bioinformatics. 2011; 79(4):1061-1078.
[29]Skwark MJ, Raimondi D, Michel M, Elofsson A. Improved contact predictions using the recognition of protein like contact patterns. PLoS Comput Biol. 2014; 10(11):1003889.
[30]Jones DT, Buchan DWA, Cozzetto D, Pontil M. Psicov: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2012; 28(2):184-90.
[31]Jones DT, Singh T, Kosciolek T, Tetchner S. Metapsicov: Combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics. 2015; 31(7):999-1006.
[32]Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S et al.. The pfam protein families database. Nucleic acids Res. 2004; 32(suppl 1):138-41.
[33]Aksel T, Barrick D. Analysis of repeat protein folding using nearest-neighbor statistical mechanical models. Methods Enzymol. 2009; 455:95-125.
[34]Ferreiro DU, Wolynes PG. The capillarity picture and the kinetics of one-dimensional protein folding. Proc Natl Acad Sci. 2008; 105(29):9853-854.
[35]Street TO, Barrick D. Predicting repeat protein folding kinetics from an experimentally determined folding energy landscape. Protein Sci. 2009; 18(1):58-68.
[36]Wetzel SK, Settanni G, Kenig M, Binz HK, Plückthun A. Folding and unfolding mechanism of highly stable full-consensus ankyrin repeat proteins. J Mol Biol. 2008; 376(1):241-57.
[37]Ferreiro DU, Cho SS, Komives EA, Wolynes PG. The energy landscape of modular repeat proteins: topology determines folding mechanism in the ankyrin family. J Mol Biol. 2005; 354(3):679-92.
[38]Di Domenico T, Potenza E, Walsh I, Gonzalo Parra R, Giollo M, Minervini G et al.. Repeatsdb: a database of tandem repeat protein structures. Nucleic Acids Res. 2014; 42(D1):352-7.
[39]Finn RD, Clements J, Eddy SR. Hmmer web server: interactive sequence similarity searching. Nucleic Acids Res. 2011; 39(Web Server issue):W29-W37.
[40]Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR et al.. Pfam: the protein families database. Nucleic Acids Res. 2014; 42(Database issue):D222-D230.
[41]Henikoff S, Henikoff JG. Position-based sequence weights. J Mol Biol. 1994; 243(4):574-8.
[42]Langfelder P, Zhang B, Horvath S. Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for r. Bioinformatics. 2008; 24(5):719-20.
[43]Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Research. 2000; 28:235-242.


	文献评价指标
	下载次数：152次	浏览次数：15次

Copyright © 2016 中国科学院文献情报中心

京公网安备340104078870146号 878987797 028-85220240

OAinOne平台基于对开放资源的发现、遴选和评价方式，发现、获取、集成9类优质的开放科技资源，包括开放期刊、开放会议论文、开放课件、科技政策、开放学位论文、开放图书、开放科技报告、科研项目、开放科学数据。同时，为实现开放知识资源普遍服务、个性化服务、精准服务，基于OAinONE集成的丰富开放资源，开发建设领域开放知识资源服务定制工具(OAtoYOU)、开放资源评价评估体系（OAEvaluation），建设集成OAinONE资源及其他第三方资源的OA Hub，及其面向我院分布式大数据知识资源系统及其他第三方的开放接口服务，并打造特色专题数据库产品建设，包括科技政策集成及趋势平台、开放课程大讲堂等。此外，OAinOne构建开放知识资源建设的可持续发展机制，支持我院研究所特色馆藏资源、自建资源、古籍资源等在OAinONE平台上的集成、开放、共享。