Modern world data come from an increasing number of sources, including data from physical sources like satellites and seismic sensors as well as social networks and web logs. While progress has been made in the filtering of individual social networks, there are significant advantages in the integration of big data from multiple sources. For physical events, the integration of physical sensors and social network data can improve filtering efficiency and quality of results beyond what is feasible in each individual data stream. Disasters are representative physical events with real world impact. In this dissertation, I present the LITMUS system that combines data from both physical sensors and social networks to provide information about physical events in near real-time. My work consists of four parts: 1) integration of multiple sources for landslide detection, 2) filtering out noise from social media, 3) geo-tagging data from social media, and 4) sharing collected data with research community. In part I, I introduce the physical event information service LITMUS, which combines multiple physical sensors and social media to handle the inherent varied origins and composition of multi-hazards, such as landslides. In part II, I propose a classification approach based on similarity of texts to Wikipedia articles followed by a new approach for fast text classification using randomized ESA, and further improve classification accuracy using a rapid ensemble classification system. In part III, I address the challenge of lack of geo-tagged data in social media by proposing location estimation based on a composition of clustering algorithms. In part IV, I describe our Twitter dataset of landslide events and illustrate its uses. This is one of the largest annotated datasets available to date.
【 预 览 】
附件列表
Files
Size
Format
View
Landslide information service based on composition of physical and social information services