The growth of the “big data” industry has created an urgent need for educating a large number of data scientists and engineers. However, learning practical data science skills requires hands-on experience with large, real-world datasets, which is difficult to offer at scale due to hardware and cost limitations. In this thesis, we present LiveDataLab, a novel cloud-based solution enabling deployment of hands-on assignments with large, real-world datasets and integration of data science education, research, and applications together in one ecosystem. LiveDataLab provides a novel project-based learning platform, alongside open leaderboard competitions, course assignment hosting, and auto-grading abilities. All of these applications are powered by a novel auto-scaling cloud backbone enabling these capabilities at relatively low cost. Additionally, LiveDataLab simultaneously serves as a platform supporting data science research via its integration with large, real-world datasets. LiveDataLab provides the ability to handle the growing demand and necessity for data science education, support data science research, and enable data science applications all in one singular platform. Ultimately, LiveDataLab brings learners, educators, researchers, and application developers together on a single unified platform to collaborate in a highly efficient big data ecosystem.
【 预 览 】
附件列表
Files
Size
Format
View
LiveDataLab: A cloud-based platform for data science education