学位论文详细信息
Semantic Feature Extraction Using Multi-Sense Embeddings and Lexical Chains
Synsets;WordNet;MSSA;Natural language processing;Semantics;Lexical chains;Computer and Information Science;College of Engineering & Computer Science
Ruas, Terry L.Zakarian, Armen ;
University of Michigan
关键词: Synsets;    WordNet;    MSSA;    Natural language processing;    Semantics;    Lexical chains;    Computer and Information Science;    College of Engineering & Computer Science;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/149647/Terry%20Ruas%20Final%20Dissertation.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

The relationship between words in a sentence often tell us more about the underlying semantic content of a document than its actual words individually. Natural language understanding has seen an increasing effort in the formation of techniques that try to produce non-trivial features, in the last few years, especially after robust word embeddings models became prominent, when they proved themselves able to capture and represent semantic relationships from massive amounts of data. These new dense vector representations indeed leverage the baseline in natural language processing, but they still fall short in dealing with intrinsic issues in linguistics, such as polysemy and homonymy. Systems that make use of natural language at its core, can be affected by a weak semantic representation of human language, resulting in inaccurate outcomes based on poor decisions.In this subject, word sense disambiguation and lexical chains have been exploring alternatives to alleviate several problems in linguistics, such as semantic representation, definitions, differentiation, polysemy, and homonymy. However, little effort is seen in combining recent advances in token embeddings (e.g. words, documents) with word sense disambiguation and lexical chains. To collaborate in building a bridge between these areas, this work proposes a collection of algorithms to extract semantic features from large corpora as its main contributions, named MSSA, MSSA-D, MSSA-NR, FLLC II, and FXLC II. The MSSA techniques focus on disambiguating and annotating each word by its specific sense, considering the semantic effects of its context. The lexical chains group derive the semantic relations between consecutive words in a document in a dynamic and pre-defined manner. These original techniques;; target is to uncover the implicit semantic links between words using their lexical structure, incorporating multi-sense embeddings, word sense disambiguation, lexical chains, and lexical databases. A few natural language problems are selected to validate the contributions of this work, in which our techniques outperform state-of-the-art systems. All the proposed algorithms can be used separately as independent components or combined in one single system to improve the semantic representation of words, sentences, and documents. Additionally, they can also work in a recurrent form, refining even more their results.

【 预 览 】
附件列表
Files Size Format View
Semantic Feature Extraction Using Multi-Sense Embeddings and Lexical Chains 2756KB PDF download
  文献评价指标  
  下载次数:23次 浏览次数:43次