IUCrJ | |
findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM | |
Grzegorz Chojnowski1  Wolfram Seifert-Davila2  Daniel J. Rigden3  Adam J. Simpkin3  Dan E. Vivas-Ruiz4  Ronan M. Keegan5  Diego A. Leonardo6  | |
[1] European Molecular Biology Laboratory, Hamburg Unit, Notkestrasse 85, 22607 Hamburg, Germany;European Molecular Biology Laboratory, Meyerhofstraße 1, 69117 Heidelberg, Germany;Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom;Laboratorio de Biología Molecular, Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos, Avenida Venezuela Cdra 34 S/N, Ciudad Universitaria, Lima, Peru;Rutherford Appleton Laboratory, Research Complex at Harwell, UKRI-STFC, Didcot OX11 0FA, United Kingdom;São Carlos Institute of Physics, University of São Paulo, Avenida João Dagnone 1100, São Carlos, SP 13563-120, Brazil; | |
关键词: protein structures; protein sequences; simbad; cryo-em; bioinformatics; structure determination; findmysequence; neural networks; | |
DOI : 10.1107/S2052252521011088 | |
来源: DOAJ |
【 摘 要 】
Although experimental protein-structure determination usually targets known proteins, chains of unknown sequence are often encountered. They can be purified from natural sources, appear as an unexpected fragment of a well characterized protein or appear as a contaminant. Regardless of the source of the problem, the unknown protein always requires characterization. Here, an automated pipeline is presented for the identification of protein sequences from cryo-EM reconstructions and crystallographic data. The method's application to characterize the crystal structure of an unknown protein purified from a snake venom is presented. It is also shown that the approach can be successfully applied to the identification of protein sequences and validation of sequence assignments in cryo-EM protein structures.
【 授权许可】
Unknown