The query-based search paradigm is based on the assumption that the searchers are able to come up with the effective differentiator terms to make their queries specificand precise. In reality, however, a large number of queries are problematic return either too many or no relevant documents in the initial search results. Existing searchsystems provide no assistance to the users when they cannot formulate an effective keyword query and receive the search results of poor quality. In some cases, the usersmay intentionally formulate broad or exploratory queries (for example, when they want to explore a particular topic without having a clear search goal). In other cases,the users may not know the domain of the search problem sufficiently well and their queries may suffer from the problems, of which they may not be aware, such asambiguity or vocabulary mismatch. Although the quality of search results can be improved by reformulating the queries, finding a good reformulation is often non-trivialand takes time. Therefore, in addition to the existing work on using the relevant documents from the top-ranked initially retrieved results to retrieve more relevantdocuments, it is important from both theoretical and practical points of view to also develop an interactive retrieval model, which would allow the search systems toimprove the users' search experience with exploratory queries, which return too many relevant documents, and difficult queries, which return no relevant documents in theinitial search results. In this thesis, we propose and study three methods for interactive feedback that allow the search systems to interactively improve the quality ofretrieval results for difficult and exploratory queries: question feedback, sense feedback and concept feedback. All three methods are based on a novel question-guidedinteractive retrieval model, in which a search system collaborates with the users in achieving their search goals by generating the natural language refinementquestions. The first method, \textit{question feedback} is aimed at interactive refinement of short, exploratory keyword-based queries by automatically generating a listclarification questions, which can be presented next to the standard ranked list of the retrieved documents. Clarification questions place the broad query terms into aspecific context and help the user focus on and explore a particular aspect of the query topic. By clicking on a question, the users are presented with an answer to itand by clicking on the answer they can be redirected to the document containing the answer for further exploration. Therefore, clarification questions can be consideredas shortcuts to specific answers. Questions also provide a more natural mechanism to elicit relevance feedback from the users. A query can be expanded by adding the termsfrom the clicked question and resubmitted to the search system, generating a new set of questions and documents retrieved with the expanded query. Enabling interactivequestion-based retrieval requires major changes to all components of the retrieval process: from more sophisticated methods of content analysis to ranking and feedback.Specifically, we propose the methods to locate and index the content, which can be used for question generation, and to generate and rank well-formed and meaningfulquestions in response to user queries. We implemented the prototype of a question-guided search system on a subset of Wikipedia and conducted the user studies, whichdemonstrated the effectiveness of the question-based feedback strategy.The second method, \textit{sense feedback}, is aimed at clarifying the intended sense of ambiguous query terms with automatically generated clarification questions in theform of \textit{``Did you mean \{ambiguous query term\} as \{sense label\}?''}, where the sense label can be a single term or a phrase. Our approach to sense detection isbased on the assumption that the senses of a word can be differentiated by grouping and analyzing all the contexts, in which a given word appears in the collection. Wepropose to detect the senses of a query term by clustering the global (based on the entire collection) graph of relationships of a query term with other terms in thecollection vocabulary. We conducted simulation experiments with two graph clustering algorithms and two methods for calculating the strength of relationship between theterms in the graph to determine the upper bound for the retrieval effectiveness of sense feedback and the best method for detecting the senses. We also proposed severalalternative methods to represent the discovered senses and conducted a user study to evaluate the effectiveness of each representation method with the actual retrievalperformance of user sense selections.The third method, \textit{concept feedback}, utilizes ConceptNet, an on-line commonsense knowledge base and natural language processing toolkit. As opposed to ontologiesand other knowledge bases, such as WordNet and Wikipedia, ConceptNet is not limited to hyponym/hypernym relations and features a more diverse relational ontology as wellas a graph-based knowledge representation model, which allows to make more complex textual inferences. First, we conducted simulation experiments by expanding each queryterm with the related concepts from ConceptNet, which demonstrated a considerable upper bound potential of tapping into a knowledge base to overcome the problem of thelack of positive relevance signals in the initial retrieval results for difficult queries. Second, we proposed and experimentally evaluated heuristic and machine learningbased methods for selecting a small number of candidate concepts for query expansion. The experimental results on multiple data sets indicate that concept feedback caneffectively improve the retrieval performance of difficult queries both when used in isolation as well as in combination with pseudo-relevance feedback.
【 预 览 】
附件列表
Files
Size
Format
View
Leveraging user interaction to improve search experience with difficult and exploratory queries