学位论文详细信息
Pattern extraction and clustering for high-dimensional discrete data
low-rank matrix factorization;binary matrix factorization;k-means clustering;approximation algorithm;pattern\rextraction;association\rrule mining;document clustering;weighted binary matrix factorization;bicluster discovery;densest k-subgraph;social network mining
Jiang, Peng
关键词: low-rank matrix factorization;    binary matrix factorization;    k-means clustering;    approximation algorithm;    pattern\rextraction;    association\rrule mining;    document clustering;    weighted binary matrix factorization;    bicluster discovery;    densest k-subgraph;    social network mining;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/46604/Peng_Jiang.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

We explore connections of low-rank matrix factorizations with interesting problems in data mining and machine learning. We propose a framework for solving several low-rank matrix factorization problems, including binary matrix factorization, constrained binary matrix factorization, weightedconstrained binary matrix factorization, densest k-subgraph,and orthogonal nonnegative matrix factorization. These combinatorial problems are NP-hard. Our goal is to develop effective approximation algorithms with good theoretical properties andapply them to solve various real application problems. We reformulate each of the problems as a special clustering problem that has the sameoptimal solution as the corresponding original problem. Making use of this property, we develop clustering algorithms to solve correspondinglow-rank matrixfactorization problems. We prove that most of our clustering algorithms have constant approximation ratios, which is a highly desirable property for NP-hard problems. We apply the proposed algorithms and compare them with existing methods for real applications in pattern extraction, document clustering, transactiondata mining, recommender systems, bicluster discovery in geneexpression data, social network mining, and image representation.

【 预 览 】
附件列表
Files Size Format View
Pattern extraction and clustering for high-dimensional discrete data 10350KB PDF download
  文献评价指标  
  下载次数:13次 浏览次数:26次