With massive datasets accumulating in text repositories (e.g., news articles, customer reviews, etc.), it is highly desirable to systematically utilize and explore them by data mining, NLP and database techniques. In our view, documents in text corpora contain informative explicit meta-attributes (e.g., category, date, author, etc.) and implicit attributes (e.g., sentiment), forming one or a set of highly-structured multi-dimensional spaces. Much knowledge can be derived if we develop effective and efficient multi-dimensional summarization, exploration and analysis technologies.In this demo, we propose an end-to-end, real-time analytical platform TextDive for processing massive text data, and provide valuable insights to general data consumers. First, we develop a set of information extraction, entity typing and text mining methods to extract consolidated dimensions and automatically construct multi-dimensional textual spaces (i.e., text cubes). Furthermore, we develop a set of OLAP-like text summarization, data exploration and text analysis mechanisms that understand semantics of text corpora in multi-dimensional spaces. We also develop an efficient computational solution that involves materializing selective statistics to guarantee the interactive and real-time nature of TextDive.
【 预 览 】
附件列表
Files
Size
Format
View
TextDive: construction, summarization and exploration of multi-dimensional text corpora