The proliferation of massively parallel nucleotide sequencing and increases in the throughput of mass spectrometry has produced an unprecedented volume of highly specific, highly accurate data elucidating the transcriptome and proteome. This data explosion has facilitated a tremendous number of novel discoveries in both disease and basic biology. It has also presented a number of challenges due to the characteristics of these cutting-edge technologies.Across these studies, we focus on the context of human cancer where these technologies are increasingly being used to characterize and target molecular aberrations for treatment tailored to individuals’ cancer biology. First, we evaluate the emerging technology of single-molecule sequencing (SMS), which may provide a clearer picture of the biological activity in the cell by avoiding the sample amplification steps that may introduce biases in the data.We then turn to the challenge of integrating NGS-derived transcriptome data with tandem mass spectrometry data quantifying the proteome. We developed a framework for integrating data from these two realms using a novel common reference employing corresponding transcript and protein sequences. We apply this framework to integrate data derived from the RWPE and VCaP prostate cell lines and show how a number of methodological factors and sources of error can impact the correlation between transcript and protein.Finally, we analyze the results of our data integration pipeline with a focus on the transcript-protein relationship. We classify the genes in our dataset into broad categories, and show how their biological roles as well as experimental characteristics impact the relationship we observe between transcript and protein. We then compare the cell lines in terms of their genes’ transcript-protein relationship with the goal of uncovering the uncoupling of this relationship in prostate cancer.The results and methods derived from this work can be used by researchers in the future to better understand the characteristics of emerging NGS technologies and integrate this data across scales of biology to better understand the molecular underpinnings of disease.
【 预 览 】
附件列表
Files
Size
Format
View
Integrative Bioinformatics in the Age of Massive Throughput Sequencing: From the Transcriptome to the Proteome in Prostate Cancer.