Li, Li ; Michael D. Purugganan, Committee Member,Bruce S. Weir, Committee Chair,Sharon R. Browning, Committee Member,Zhao-Bang Zeng, Committee Member,Margarate G. Ehm, Committee Member,Li, Li ; Michael D. Purugganan ; Committee Member ; Bruce S. Weir ; Committee Chair ; Sharon R. Browning ; Committee Member ; Zhao-Bang Zeng ; Committee Member ; Margarate G. Ehm ; Committee Member
Disease gene mapping is one of the main focuses of genetic epidemiology and statistical genetics. This dissertation explores some methods and algorithms in this area, especially in pedigrees. The first chapter gives an introduction to human genetics and disease gene mapping. Existing linkage and association methods are introduced and compared.Probabilities of genotypic data from multiple linked marker loci on related individuals are used as likelihoods of gene locations for gene-mapping, or as likelihoods of other parameters of interest in human genetics. With the recent development in genetics and molecular biology techniques, large-scale marker data has become available, which requires highly efficient likelihood calculations especially for complex pedigrees.Algorithms for likelihood calculations for pedigree data are reviewed in chapter 2. Besides exact likelihood calculation methods and MCMC, a Sequential Importance Sampling (SIS) approach has been proposed to enable calculations for large pedigrees with large numbers of markers.However, when the system gets large, the variance of the importance sampling weights increases while both efficiency and accuracy of the method decrease. We propose an optimization algorithm for calculating the likelihood of general pedigrees in Chapter 3. We incorporate a resampling strategy into SIS to reduce the variance inflation problem. A successful linkage analysis may identify a linkage region of interest containing hundreds of genes at a magnitude of perhaps ten to thirty centiMorgans. A follow-up association (or so-called linkage disequilibrium) analysis can provide much finer gene-mapping but is subject to greater multiple testing problems. In Chapter 4, we present a method for determining whether an association result is responsible for a non-parametric linkage result for binary traits in general pedigrees. The correlation between family frequency of a variant of interest and family LOD score is used as a measure of whether the association between a given variant at a marker and the disease status can help to explain a significant linkage result seen in the collection of families in the region around the marker.