The field of molecular evolution has progressed with the accumulation of various molecular data. It started with the analysis of protein sequence data, followed by that of gene and genome sequence dada. Recently, structural genomics and proteomics have offered new types of data for addressing molecular evolution questions. Structural genomics refers to genome-wide collection of protein structures, whereas proteomics is the study of all proteins in a cell or organism. In this thesis, I conducted molecular evolutionary projects using data provided by structural genomics and proteomics. First, I used protein structure information to explain why some human-disease associated amino acid residues (DARs) appear as the wild-type in other species.Because destabilizing protein structures is a primary reason why DARs are deleterious, I focused on protein stability and discovered that, in species where a DAR represents the wild-type, the destabilizing effect of the DAR is generally lessened by the observed amino acid substitutions in the spatial proximity of the DAR. This finding of compensatory residue substitutions has important implications for understanding epistasis in protein evolution. Second, the recently published human proteomes include peptides encoded by annotated pseudogenes, which are relics of formerly functional genes. These translated pseudogenes may actually be functional and subject to purifying selection. Alternatively, their translations may be accidental and do not indicate functionality. My analysis suggests that a sizable fraction of the translated pseudogenes are subject to purifying selection acting at the protein level. Third, for the purpose of understanding protein evolution and structure-function relationships, protein structures are classified according to their structure similarities. A fold encompasses protein structures with similar core topologies. Current fold classifications implicitly assume that folds are discrete islands in the protein structure space, whereas increasing evidence supports a continuous fold space. I developed a likelihood method to classify structures into existing folds by considering the continuity in fold space. My results using this method demonstrated the growing importance of considering this continuity in fold classification. Together, my work illustrated the utility of structural genomics and proteomics in answering evolutionary questions and provided better understanding of gene and protein evolution.
【 预 览 】
附件列表
Files
Size
Format
View
Molecular Evolutionary Studies using Structural Genomics and Proteomics.