学位论文详细信息
Nonnegative matrix factorization for text, graph, and hybrid data analytics
Constrained low rank approximation;Nonnegative matrix factorization;Data analytics;Content clustering;Graph clustering
Du, Rundong ; Park, Haesun Mathematics Chau, Duen Horng (Polo) Chow, Edmond Kang, Sung Ha Zhou, Hao-Min ; Park, Haesun
University:Georgia Institute of Technology
Department:Mathematics
关键词: Constrained low rank approximation;    Nonnegative matrix factorization;    Data analytics;    Content clustering;    Graph clustering;   
Others  :  https://smartech.gatech.edu/bitstream/1853/59914/1/DU-DISSERTATION-2018.pdf
美国|英语
来源: SMARTech Repository
PDF
【 摘 要 】

Constrained low rank approximation is a general framework for data analysis, which usually has the advantage of being simple, fast, scalable and domain general. One of the most known constrained low rank approximation methods is nonnegative matrix factorization (NMF). This research studies the design and implementation of several variants of NMF for text, graph and hybrid data analytics. It will address challenges including solving new data analytics problems and improving the scalability of existing NMF algorithms. There are two major types of matrix representation of data: feature-data matrix and similarity matrix. Previous work showed successful application of standard NMF for feature-data matrix to areas such as text mining and image analysis, and Symmetric NMF (SymNMF) for similarity matrix to areas such as graph clustering and community detection. In this work, a divide-and-conquer strategy is applied to both methods to improve their time complexity from cubic growth with respect to the reduced low rank to linear growth, resulting in DC-NMF and HierSymNMF2 methods. Extensive experiments on large scale real world data show improved performance of these two methods. Furthermore, in this work NMF and SymNMF are combined into one formulation called JointNMF, to analyze hybrid data that contains both text content and connection structure information. Typical hybrid data where JointNMF can be applied includes paper/patent data where there are citation connections among content and email data where the sender/receipts relation is represented by a hypergraph and the email content is associated with hypergraph edges. An additional capability of the JointNMF is prediction of unknown network information which is illustrated using several real world problems such as citation recommendations of papers and activity/leader detection in organizations. This dissertation also includes brief discussions of relationship among different variants of NMF.

【 预 览 】
附件列表
Files Size Format View
Nonnegative matrix factorization for text, graph, and hybrid data analytics 7191KB PDF download
  文献评价指标  
  下载次数:0次 浏览次数:2次