This thesis describes the design and implementation of the search engine module in a novel Cloud-based Open Lab for Data Science (COLDS) system. COLDS is a general infrastructure system to support data science programming assignments on the cloud that is currently being developed at the University of Illinois at Urbana-Champaign in collaboration with Microsoft and Intel with Azure grant support from Microsoft and a gift fund support from Intel. The annotation subsystem of COLDS is responsible for helping instructors design flexible annotation tasks and straightforward annotation of data sets using search engine results. The function of the search engine module in the annotation subsystem of COLDS includes allowing instructors to upload customized data sets, building inverted index for data sets to support fast query and selecting ranking functions with customized parameters to perform query and get a ranked list of results. The thesis describes the design and implementation of the search engine module, including specifically its data set uploading and configuration procedure, indexing of data set, storage of the data set and index, and ranking and querying with selected method, parameters and data set. This thesis also describes the background, related work, challenges and future work of COLDS and its annotation subsystem.
【 预 览 】
附件列表
Files
Size
Format
View
Design and implementation of the search engine module in colds