学位论文

【摘要】

People generate and share a huge variety of information that can be used to derive useful insights and create value for society.Keyword queries are by far the most popular method for users to search, explore, and understand such information. Because search is more effective when information is well organized, people often choose to add structure to their information, producing semi-structured or even structured data.However, the new structure is not always added in the right places, i.e., where it will maximize the improvement in search results.Further, seemingly unimportant changes in the structure of the information tend to affect search results in undesirable ways.In addition, current keyword search and exploration systems are not very effective for certain types of information and queries.This thesis addresses these three issues by showing how to determine where to add structure to information for maximal benefit, and how to use the resulting structure to provide effective and robust answers to keyword queries. Our first contribution is to determine what structure should be imposed on a given collection of information. Since the resources for converting information to a more structured format are always limited, our goal is to impose structure in a manner that will maximize the improvement for subsequent queries.In more technical terms, this means discovering and maintaining exactly the parts of a database schema that will most improve users' search and exploration experience.We show that adding structure corresponding to the most ``popular'' parts of a schema does not generally maximize search effectiveness. We formalize the problem of deciding which parts of a schema to extract and materialize, prove that the problem is NP-hard, propose polynomial-time solution algorithms, and prove that their effectiveness is within a constant factor of optimal.Given a collection of semi-structured information, our next contribution is a principled way to take this structure into account when answering keyword queries.Because the same information can be represented in many ways, we propose the principle of {\it design independence}, which postulates that search results should not depend on the details of the design chosen for the information's structure.Otherwise, search may be effective over one schema design but provide poor rankings over another equivalent design. We formalize this principle and show that current search approaches for semi-structured information are not design independent. We propose a new approach to keyword search over semi-structured information, prove that it is design independent, and show that it gives better search results in practice than all previous approaches. A schema describes the concepts that underlie a collection of information, but users do not usually explicitly specify these concepts in keyword queries.Thus the query term {\it Java} could refer to an island, a programming language, or a drink.When query terms can refer to multiple concepts, it is hard to devise effective heuristics to determine which answers should rank highest. In this thesis, we use the probability ranking principle and other widely accepted principles to provide a solid formal model for ranking the answers to such queries over semi-structured data. In an extensive empirical study, we show that our method generally provides more effective rankings than previous approaches.Schemas provide information about concepts and the relationships between them.Current keyword search approaches generally assume that all types of relationships are equally important to users, which can lead to poor ranking of answers.We provide a way to quantify the importance of relationships between concepts, provide an efficient way to use this metric to rank answers, and show that users prefer the resulting rankings over those of previous methods.If a query is very broad and underspecified (e.g., {\it vacation}), then even well-ranked answers are unlikely to please users.In such situations, we believe that the best approach is to detect that this is a difficult query and then use other techniques, such as query completion suggestions, to improve the user's experience. We provide a probabilistic model that accurately predicts how underspecified a query is, together with algorithms that quickly compute this measure when a user queries semi-structured data.

【预览】

附件列表
Files	Size	Format	View
Principles of usable query interfaces	1763KB	PDF	download


Principles of usable query interfaces
Keyword Search;Database Systems;Design Independence
Termehchy, Arash
关键词: Keyword Search; Database Systems; Design Independence;
Others : https://www.ideals.illinois.edu/bitstream/handle/2142/34305/Termehchy_Arash.pdf?sequence=1&isAllowed=y
美国\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：14次	浏览次数：30次

【 摘 要 】

【 预 览 】

【摘要】

【预览】