Due to the widespread use and importance of crowdsourcing in gathering training data at scale, the data management community has devoted its efforts in understanding and optimizing fundamental primitives like filters and joins. These primitive boolean operations, where the human responses come from a small, finite space of possible answers, are inadequate for a number of data analysis tasks, especially those involving images, videos and maps. There is, thus, a need for open-ended crowdsourcing in order to get more fine-grained information from humans that can be used in developing sophisticated AI systems. In this thesis, we study two popular open-ended crowdsourcing problems. The first, clustering, is the problem of organizing a collection of objects (images, videos) by allowing workers to form as many clusters as they would like and organize items across them. The second, counting, is the problem of counting objects in images. In this thesis, we develop models to reason about human behavior for both problems, and use these models to design provably cost-efficient algorithms that provide high-quality results, as compared to currently available approaches.
【 预 览 】
附件列表
Files
Size
Format
View
Towards open-ended crowd-powered data processing: a case study of clustering and counting