Multivariate Statistical Analysis of Protein Variation
basic helix-loop-helix;multivariate analysis;sequence metric problem
Zhao, Jieping ; Bruce S. Weir, Committee Member,Zhao-Bang Zeng, Committee Member,Thomas M. Gerig, Committee Member,William R. Atchley, Committee Chair,Zhao, Jieping ; Bruce S. Weir ; Committee Member ; Zhao-Bang Zeng ; Committee Member ; Thomas M. Gerig ; Committee Member ; William R. Atchley ; Committee Chair
The purpose of this research is to study the protein sequence metric problem solution and apply it to explore the structural, functional and evolutionary aspects of basic helix-loop-helix (bHLH) protein family. Sequence metric problem is caused by the alphabetic coding of the amino acids and has long been a hindrance to efficient protein sequence analysis. This dissertation started with revisiting sequence metric problem solution initiated by Atchley et al (2005) [PNAS102(18):6401-6]. Some of the unsolved issues, such as information loss, model robustness, and concordance between factor analysis and principal component analysis were studied. Further, classification of 20 amino acids was explored in the numerical factor space resolved by Atchley et al (2005)Next two parts of the dissertation were focused on computational and statistical studies of the bHLH protein family. All the protein sequence data were transformed into numerical vectors by using the amino acid factor scores from the sequence metric solution.In the second part of the dissertation, protein sequence variability in the level of statistically supported lineages (=clades) was studied using the stepwise discriminant analysis. Some of the important sites for the clades discrimination were selected and hierarchical classification strategies for the clades were proposed. In the third part of the dissertation, 147 Arabidopsis bHLH proteins were studied by a series of multivariate analyses. Results showed that there were significant sequence differences between plant (e.g. Arabidopsis) and animal bHLH proteins, and some of the contributing discriminant sites were selected and discussed. Binding property of each of the Arabidopsis bHLH proteins was assigned by using the classification rules derived from animal bHLH proteins.
【 预 览 】
附件列表
Files
Size
Format
View
Multivariate Statistical Analysis of Protein Variation