This thesis focuses on relation extraction within unstructured text data. We are interested in the bootstrapping approach, in which only a small portion of examples are given to train the extractor. The training of the extractor is actually a process of finding good textual representation patterns for that relationship and the duality relationship between tuples and patterns are explored as a mutual enhancement in an iterative way. However, due to the lack of decent amount of labelled data at the beginning, the bootstrapping performance is often unsatisfactory. Recent literatures explore additional meta level information such as constraints and find a way to add it along with bootstrapping seeds to further reinforce supervision. Our approach takes a step further by exploring how to better incorporate such domain specific constraints into the ranking process of selecting textual patterns for better extraction precision and recall. Thus, we call it a constriant-based metric-aware approach. We explore three types of general constraints and develop models for each of them. We finally conduct experiment on the Wikipedia article dataset, and the results show that with our model, we can achieve significant performance boost in terms of f1 score.
【 预 览 】
附件列表
Files
Size
Format
View
Constraint-based metric-aware approach for relation co-extraction