期刊论文详细信息
Statistical Analysis and Data Mining
Sparse Fisher's linear discriminant analysis for partially labeled data
Qiyi Lu1 
[1] Department of Mathematical Sciences Binghamton University, State University of New York Binghamton New York 139026000
关键词: Bayes decision rule;    classification;    clustering;    difference‐;    convex algorithm;    high‐;    dimensional low‐;    sample‐;    size data;    semisupervised learning;    sparsity;   
DOI  :  10.1002/sam.11367
学科分类:社会科学、人文和艺术(综合)
来源: John Wiley & Sons, Inc.
PDF
【 摘 要 】

Classification is an important tool with many useful applications. Fisher's linear discriminant analysis (LDA) is a traditional model‐based classification method which makes use of the Gaussian distributional information. However, in the high‐dimensional, low‐sample‐size setting, LDA cannot be directly deployed because the sample covariance is not invertible. While there are modern methods for high‐dimensional data, they may not fully use the information as LDA does. Hence in some situations, it is still desirable to use a model‐based method for classification. This paper exploits the potential of LDA in a more complicated data setting. In many real applications, it is costly to manually place labels on observations; consequently, often only a small portion of labeled data is available while a large number of observations are left without labels. It is a great challenge to obtain good classification performance through the labeled data alone, especially in the high‐dimensional setting. In order to overcome this issue, we propose a semisupervised sparse LDA classifier to take advantage of the seemingly useless unlabeled data, which helps to boost the classification performance in some situations. A direct estimation method is used to reconstruct LDA and achieve sparsity; meanwhile we employ the difference‐convex algorithm to handle the nonconvex loss function associated with the unlabeled data. Theoretical properties of the proposed classifier are studied. Our simulated examples help understand when and how the information extracted from the unlabeled data can be useful. A real data example further illustrates the usefulness of the proposed method.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO201910251236594ZK.pdf 877KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:17次