学位论文详细信息
Methods for Collection and Processing of Gene Expression Data
Andrews curves;bioconjugations;clustering;gene expression;Lorentzian distributions;MAGE;mixture modeling;mRNA quantification;multivariate data;oligonucleotides;peptoids;visualization
Murphy, John Frank ; Davis, Mark E.
University:California Institute of Technology
Department:Chemistry and Chemical Engineering
关键词: Andrews curves;    bioconjugations;    clustering;    gene expression;    Lorentzian distributions;    MAGE;    mixture modeling;    mRNA quantification;    multivariate data;    oligonucleotides;    peptoids;    visualization;   
Others  :  https://thesis.library.caltech.edu/2723/1/JFM_Thesis_Complete.pdf
美国|英语
来源: Caltech THESIS
PDF
【 摘 要 】

Examination of the transcriptional messages encoded in the manifold of mRNA molecules within a cell is a central task of molecular biology and functional genomics.This examination can be broken down into two parts: collection of gene expression data, and analyses of those data.Here, a new method for collecting gene expression data, and two new methods for analyzing those data are presented.

A new method for quantifying gene expression denoted as the Mass-spectrometric Analysis of Gene Expression (MAGE) is developed. MAGE relies on novel conjugates of DNA oligonucleotide 30-mers; each unique sequence is conjugated via photolabile linker to an N-substituted glycine oligomer (peptoid) of unique mass.Deuterated bromoacetic acid is incorporated into some peptoids yielding two chemically identical probe conjugates of different molecular weights for each nucleic acid sequence of interest.Mixtures of these probes, along with 3' adjacent biotin-labeled oligonucleotides, are used to interrogate a target mixture of cDNA.Following hybridization, the two adjacent probes are ligated to enhance the specificity of the identification, and to enable the use of a biotin-affinity column for removal of confounding peptoid tags.The resulting mixture is exposed to longwave ultraviolet light to release the peptoid tags, that are quantified using MALDI-TOF mass spectrometry using the isotopically labeled peptoids as internal standards.These individual components of MAGE are demonstrated.

A strategy for simplification and visualizing of high-dimensional gene expression data, as well as a strategy for inferring the presence of clusters within those data, is formulated and implemented.In order to visualize high-dimensional gene expression data, principle components analysis is used with subsequent mapping of the data onto an orthogonal set of basis functions known as Andrews curves.This analysis method is demonstrated by visualizing of breast cancer tumor data and yeast sporulation data. In order to cluster gene expression data, the expectation-maximization algorithm is employed to optimize the parameters of a mixture model of Lorentzian distributions.The difference between Lorentzian and Gaussian mixture models is first demonstrated with artificial data, and then applied to yeast sporulation data.The results indicate that mixtures of Lorentzian distributions may have significant utility for gene expression analysis.

The tools demonstrated here offer unique advantages when compared to the current suite of experimental and analytical tools employed by investigators of functional genomics.

【 预 览 】
附件列表
Files Size Format View
Methods for Collection and Processing of Gene Expression Data 2068KB PDF download
  文献评价指标  
  下载次数:11次 浏览次数:36次