学位论文详细信息
Feature Selection for Gene Expression Data Based on Hilbert-Schmidt Independence Criterion
Feature selection;Hilbert-Schmidt Independence Criterion;Gene expression data;Statistics
Zarkoob, Hadi
University of Waterloo
关键词: Feature selection;    Hilbert-Schmidt Independence Criterion;    Gene expression data;    Statistics;   
Others  :  https://uwspace.uwaterloo.ca/bitstream/10012/5247/1/Zarkoob_Hadi.pdf
瑞士|英语
来源: UWSPACE Waterloo Institutional Repository
PDF
【 摘 要 】

DNA microarrays are capable of measuring expression levels of thousands of genes, even the whole genome, in a single experiment. Based on this, they have been widely used to extend the studies of cancerous tissues to a genomic level. One of the main goals in DNA microarray experiments is to identify a set of relevant genes such that the desired outputs of the experiment mostly depend on this set, to the exclusion of the rest of thegenes. This is motivated by the fact that the biological process in cell typically involves only a subset of genes, and not the whole genome.The task of selecting a subset of relevant genes is called feature (gene) selection. Herein, we propose a feature selection algorithm for gene expression data. It is based on the Hilbert-Schmidt independence criterion, and partly motivatedby Rank-One Downdate (R1D) and the Singular ValueDecomposition (SVD). The algorithm is computationally very fast andscalable to large data sets, and can be applied to response variables of arbitrary type (categorical and continuous). Experimentalresults of the proposed technique are presentedon some synthetic and well-known microarray data sets. Later, we discuss the capability of HSIC in providing a general framework which encapsulates many widely used techniques for dimensionality reduction, clustering and metric learning. We will use this framework to explain two metric learning algorithms, namely the Fisher discriminant analysis (FDA) and closed form metric learning (CFML). As a result of this framework, we are able to propose a new metric learning method. The proposed technique uses the concepts from normalized cut spectral clustering and is associated with an underlying convex optimization problem.

【 预 览 】
附件列表
Files Size Format View
Feature Selection for Gene Expression Data Based on Hilbert-Schmidt Independence Criterion 1344KB PDF download
  文献评价指标  
  下载次数:11次 浏览次数:34次