学位论文详细信息
Utilizing multiple entities from collection of unstructured documents in constructing attribute-value pairs
attribute extraction;(attribute-value pair) nvp;value extraction;evaluation
Cho, Hyun Duk ; Zhai ; ChengXiang
关键词: attribute extraction;    (attribute-value pair) nvp;    value extraction;    evaluation;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/34506/Cho_Hyun%20Duk.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Attribute-value pairs, or NVP is defined as extracting words expressing characteristics of entity and associating the said words with word or phrases that best describe the attributes. Applications for NVP arise in various related area such as sentiment analysis, populating and checking for errors in relational database to a broader text information area such as QA systems, search and review modeling. We propose an unsupervised method to identify the properties of entities represented as NVP from unstructured documents. Other approaches that extract NVP usually uti- lize supervised or semi-supervised approaches on structured or semi-structured documents. Benefits of such approaches lie in that they tend to have higher accuracy than unsuper- vised approaches on unstructured documents. Furthermore, supervised approaches are more suited to distinguishing attribute words to that of value words than unsupervised approaches on unstructured documents. The biggest drawback with the said methods however, is that training data may not always be available and not all documents can be thought of as being unstructured. We first proposes in this thesis an approach to extracting and distinguishing attribute words and value words from unstructured documents. Since entities of the same class share similar attributes, we propose that the identification of relevant attributes should be done across entities belonging to the same class, and demonstrate that this can lead to a significant performance gain in attribute extraction, even when only documents describing a modest number of entities per class is available. We then propose a way to evaluate the accuracy of attribute-value pairs automatically, allowing for quantitative comparison between different systems that is more consistent and cost-effective than manual evaluations. These were used in evaluating summarization or comparing ontologies. However, these techniques have not been utilized in evaluating NVP. Both the automated and manual evaluations show that our system outperforms a comparison system.

【 预 览 】
附件列表
Files Size Format View
Utilizing multiple entities from collection of unstructured documents in constructing attribute-value pairs 2024KB PDF download
  文献评价指标  
  下载次数:0次 浏览次数:2次