学位论文详细信息
A schema conversion approach for constructing heterogeneous information networks from documents
Information network construction
Kim, Hyung Sul
关键词: Information network construction;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/97387/KIM-DISSERTATION-2017.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Information networks with multi-typed nodes and edges with different semantics are called heterogenous information networks. Since heterogeneous information networks embed more complex information than homogeneous information networks due to their multi-typed nodes and edges, mining such networks has produced richer knowledge and insights.To extend the application of heterogeneous information network analysis to document analysis, it is necessary to build information networks from a collection of documents while preserving important information in the documents.This thesis describes a schema conversion approach to apply data mining techniques on the outcomes of natural language processing (NLP) tools to construct heterogeneous information networks.First, we utilize named entity recognition (NER) tools to explore networks over entities, topics, and words to demonstrate how a probabilistic model can convert the data schema of the NER tools. Second, we address a pat- tern mining method to construct a network with authors, documents, and writing styles by extracting discriminative writing styles from parse trees and converting them into nodes in a network. Third, we introduce a clustering method to merge redundant nodes in an information network with documents, claims, subjective, objective, and verbs. We use a semantic role labeling (SRL) tool to get initial network structures from news articles, and merge duplicated nodes using a similarity measure SynRank. Finally, we present a novel event mining framework for extracting high-quality structured event knowledge from large, redundant, and noisy news data. The proposed framework ProxiModel utilizes named entity recognition, time expression extraction, and phrase mining tools to get event information from documents.

【 预 览 】
附件列表
Files Size Format View
A schema conversion approach for constructing heterogeneous information networks from documents 2723KB PDF download
  文献评价指标  
  下载次数:4次 浏览次数:5次