As digital data permeates every aspect of our daily life, more and more end-users are organizing their everyday data electronically. In fact, end-users are already used to managing their personal data such as contact books and calendars in electronic devices. Meanwhile, the desire for organizing more information into the computer is expanding for a broader group of users. For example, a scientist may need to regularly manage a substantial amount of science data on his desktop.However, to organize such everyday data is challenging for these end-users, because they have limited knowledge about data schema, which is key to data management tasks such as database design, data transformation and data integration. While the user is struggling with these schema tasks, various cognitive and operational burdens emerge. First, when designing her data collection, the user has the burden to abstract her mental model of her real-life data into a reasonable schema design. Moreover, when incorporating external data sources, there is a burden to understand the source semantics and a burden to transform the data from those sources into the user;;s own data collection. Meanwhile, if the user wants to filter the data, she has the burden to understand and specify the selection condition. Finally, when existing sources are update, there is a burden to understand and fuse these updates.This dissertation introduces various approaches to help the end-user reduce these burdens. To ease the design pain, the dissertation proposes a system with a next-generation spreadsheet for the end-user to easily design and evolve her schema. To facilitate incorporation of external data sources, a sample-driven schema mapping approach is introduced so that the user can freely provide sample instances in her own collection and the system will automatically deduce the desired schema mapping from the sources to the collection. In a similar flavor, this dissertation proposes an approach to facilitate the user in specifying selection conditions via example data points she wants to select. Finally, to help the user incorporate source data updates into her data collection, the dissertation proposes a technique to incrementally update the integrated data using previous integration results.
【 预 览 】
附件列表
Files
Size
Format
View
Reducing End-User Burden in Everyday Data Organization.