学位论文详细信息
Leveraging contextual cues for dynamic scene understanding
Computer vision;Machine learning;Ubiquitous computing;Context;Activity recognition;Anomaly detection;Skill classification;Food recognition;Egocentric;Basketball highlights;Sports
Bettadapura, Vinay Kumar ; Essa, Irfan Interactive Computing Abowd, Gregory D. Starner, Thad Pantofaru, Caroline Sukthankar, Rahul ; Essa, Irfan
University:Georgia Institute of Technology
Department:Interactive Computing
关键词: Computer vision;    Machine learning;    Ubiquitous computing;    Context;    Activity recognition;    Anomaly detection;    Skill classification;    Food recognition;    Egocentric;    Basketball highlights;    Sports;   
Others  :  https://smartech.gatech.edu/bitstream/1853/54834/1/BETTADAPURA-DISSERTATION-2016.pdf
美国|英语
来源: SMARTech Repository
PDF
【 摘 要 】
Environments with people are complex, with many activities and events that need to be represented and explained. The goal of scene understanding is to either determine what objects and people are doing in such complex and dynamic environments, or to know the overall happenings, such as the highlights of the scene. The context within which the activities and events unfold provides key insights that cannot be derived by studying the activities and events alone. \emph{In this thesis, we show that this rich contextual information can be successfully leveraged, along with the video data, to support dynamic scene understanding}. We categorize and study four different types of contextual cues: (1) spatio-temporal context, (2) egocentric context, (3) geographic context, and (4) environmental context, and show that they improve dynamic scene understanding tasks across several different application domains.We start by presenting data-driven techniques to enrich spatio-temporal context by augmenting Bag-of-Words models with temporal, local and global causality information and show that this improves activity recognition, anomaly detection and scene assessment from videos. Next, we leverage the egocentric context derived from sensor data captured from first-person point-of-view devices to perform field-of-view localization in order to understand the user's focus of attention. We demonstrate single and multi-user field-of-view localization in both indoor and outdoor environments with applications in augmented reality, event understanding and studying social interactions. Next, we look at how geographic context can be leveraged to make challenging ``in-the-wild" object recognition tasks more tractable using the problem of food recognition in restaurants as a case-study. Finally, we study the environmental context obtained from dynamic scenes such as sporting events, which take place in responsive environments such as stadiums and gymnasiums, and show that it can be successfully used to address the challenging task of automatically generating basketball highlights. We perform comprehensive user-studies on 25 full-length NCAA games and demonstrate theeffectiveness of environmental context in producing highlights that are comparable to the highlights produced by ESPN.
【 预 览 】
附件列表
Files Size Format View
Leveraging contextual cues for dynamic scene understanding 28360KB PDF download
  文献评价指标  
  下载次数:13次 浏览次数:25次