Many Natural Language Processing (NLP) techniques have been applied in the fieldof Question Answering (QA) for understanding natural language queries. Practical QAsystems classify a natural language query into vertical domains, and determine whether itis similar to a question with known or latent answers. Current mobile personal assistantapplications process queries, recognized from voice input or translated from cross-lingualqueries. Theoretically speaking, all these problems rely on an intuitive notion of semantic distance. However, it is neither definable nor computable. Many studies attempt toapproximate such a semantic distance in heuristic ways, for instance, distances based onsynonym dictionaries. In this paper, we propose a unified algorithm to approximate thesemantic distance by a well-defined information distance theory. The algorithm dependson a pre-constructed data structure - semantic clusters, which is built from 35 millionquestion-answer pairs automatically. From the semantic measurement of questions, weimplement two practical NLP systems, including a question classifier and a translationcorrector. Then a series of comparison experiments have been conducted on both implementations. Experimental results demonstrate that our distance based approach producesfewer errors in classification, compared with other academic works. Also, our translationcorrection system achieves significant improvements on the Google translation results.
【 预 览 】
附件列表
Files
Size
Format
View
A Semantic Distance of Natural Language Queries Based on Question-Answer Pairs