学位论文详细信息
Multi-dimensional mining of unstructured data with limited supervision
data mining;multi-dimensional analysis;less supervision
Zhang, Chao
关键词: data mining;    multi-dimensional analysis;    less supervision;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/102465/ZHANG-DISSERTATION-2018.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

As one of the most important data forms, unstructured text data plays a crucial role in data-driven decision making in domains ranging from social networking and information retrieval to healthcare and scientific research.In many emerging applications, people's information needs from text data are becoming multi-dimensional---they demand useful insights for multiple aspects from the given text corpus.However, turning massive text data into multi-dimensional knowledge remains a challenge that cannot be readily addressed by existing data mining techniques.In this thesis, we propose algorithms that turn unstructured text data into multi-dimensional knowledge with limited supervision. We investigate two core questions: 1. How to identify task-relevant data with declarative queries in multiple dimensions?2. How to distill knowledge from data in a multi-dimensional space?To address the above questions, we propose an integrated cube construction and exploitation framework.First, we develop a cube construction module that organizes unstructured data into a cube structure, by discovering latent multi-dimensional and multi-granular structure from the unstructured text corpus and allocating documents into the structure.Second, we develop a cube exploitation module that models multiple dimensions in the cube space, thereby distilling multi-dimensional knowledge from data to provide insights along multiple dimensions.Together, these two modules constitute an integrated pipeline: leveraging the cube structure, users can perform multi-dimensional, multi-granular data selection with declarative queries; and with cube exploitation algorithms, users can make accurate cross-dimension predictions or extract multi-dimensional patterns for decision making.The proposed framework has two distinctive advantages when turning text data into multi-dimensional knowledge: flexibility and label-efficiency. First, it enables acquiring multi-dimensional knowledge flexibly, as the cube structure allows users to easily identify task-relevant data along multiple dimensions at varied granularities and further distill multi-dimensional knowledge. Second, the algorithms for cube construction and exploitation require little supervision; this makes the framework appealing for many applications where labeled data are expensive to obtain.

【 预 览 】
附件列表
Files Size Format View
Multi-dimensional mining of unstructured data with limited supervision 15626KB PDF download
  文献评价指标  
  下载次数:5次 浏览次数:10次