Cancer is one of the leading causes of death worldwide. In recent years, with the aid of high-throughput genomic technologies, large cohorts of tumor samples have been analyzed to characterize molecular aberrations in many cancer types. These studies have generated enormous amount of cancer genomics data, providing not only new opportunities to understand tumor evolution and cancer progression mechanisms but also new challenges in efficiently and rigorously analyzing the data. Heterogeneity is an important feature of cancer and has significant impact on the diagnosis and treatment of the disease. My dissertation focuses on developing new bioinformatics and biostatistical approaches to study the heterogeneity and evolutionary history of cancer genomes. Under this theme, my thesis consists of four main chapters. First, I have developed an algorithm to infer aneuploid and euploid cell mixing ratios using allele-specific DNA copy number alteration (CNA) data, and made a striking discovery that gene expression patterns in brain and ovarian tumors are strongly influenced by aneuploid content. The ability to infer mixing ratios allowed me to revise the current classification system for glioblastoma, with better predictive power of clinical outcome than previous results. Second, I developed a Clonal Heterogeneity Analysis Tool (CHAT) that estimates cellular fractions for individual CNAs and individual somatic mutations, allowing us to use the distribution of these fractions to inform the macroscopic clonal architecture and the relative order of occurrence of somatic changes. For example, a CNA with a higher frequency in the cell population may have occurred earlier in tumor development or conferred a greater growth rate, therefore is more likely to contain driver genes. Third, I developed a method to detect short tandem repeat (STR) variation using paired-end short-read next-generation DNA sequencing data. Unlike previous methods which are limited to finding short STR alleles, my method is capable of finding both STR alleles shorter than a read and those longer than the read or the read pair. This capability addresses the need to reliably detect expanded STR alleles in germline DNA that underlie many rare inherited diseases as well as somatic aberrations characterized by microsatellite instability.
【 预 览 】
附件列表
Files
Size
Format
View
Development and Application of Novel Methods to Study Tumor Heterogeneity and Cancer Genome Evolution.