Recently, the use of Twitter data has become important for a wide range of real-time applications, including real-time event detection, topic detection or disaster and emergency management. These applications require to know the precise location of the tweets for their analysis. However, approximately 1% of the tweets are finely-grained geotagged, which remains insufficient for such applications. To overcome this limitation, predicting the location of non-geotagged tweets, while challenging, can increase the sample of geotagged data to support the applications mentioned above. Nevertheless, existing approaches on tweet geolocalisation are mostly focusing on the geolocation of tweets at a coarse-grained level of granularity (i.e., city or country level). Thus, geolocalising tweets at a fine-grained level (i.e., street or building level) has arisen as a newly open research problem. In this thesis, we investigate the problem of inferring the geolocation of non-geotagged tweets at a fine-grained level of granularity (i.e., at most 1 km error distance). In particular, we aim to predict the geolocation where a given tweet was generated using its text as a source of evidence. This thesis states that the geolocalisation of non-geotagged tweets at a fine-grained level can be achieved by exploiting the characteristics of the 1\% of already available individual finely-grained geotagged tweets provided by the Twitter stream. We evaluate the state-of-the-art, derive insights on their issues and propose an evolution of techniques to achieve the geolocalisation of tweets at a fine-grained level.First, we explore the existing approaches in the literature for tweet geolocalisation and derive insights on the problems they exhibit when adapted to work at a fine-grained level. To overcome these problems, we propose a new approach that ranks individual geotagged tweets based on their content similarity to a given non-geotagged. Our experimental results show significant improvements over previous approaches. Next, we explore the predictability of the location of a tweet at a fine-grained level in order to reduce the average error distance of the predictions. We postulate that to obtain a fine-grained prediction a correlation between similarity and geographical distance should exist, and define the boundaries were fine-grained predictions can be achieved. To do that, we incorporate a majority voting algorithm to the ranking approach that assesses if such correlation exists by exploiting the geographical evidence encoded within the Top-N most similar geotagged tweets in the ranking. We report experimental results and demonstrate that by considering this geographical evidence, we can reduce the average error distance, but with a cost in coverage (the number of tweets for which our approach can find a fine-grained geolocation).Furthermore, we investigate whether the quality of the ranking of the Top-N geotagged tweets affects the effectiveness of fine-grained geolocalisation, and propose a new approach to improve the ranking. To this end, we adopt a learning to rank approach that re-ranks geotagged tweets based on their geographical proximity to a given non-geotagged tweet. We test different learning to rank algorithms and propose multiple features to model fine-grained geolocalisation. Moreover, we investigate the best performing combination of features for fine-grained geolocalisation.This thesis also demonstrates the applicability and generalisation of our fine-grained geolocalisation approaches in a practical scenario related to a traffic incident detection task. We show the effectiveness of using new geolocalised incident-related tweets in detecting the geolocation of real incidents reports, and demonstrate that we can improve the overall performance of the traffic incident detection task by enhancing the already available geotagged tweets with new tweets that were geolocalised using our approach.The key contribution of this thesis is the development of effective approaches for geolocalising tweets at a fine-grained level. The thesis provides insights on the main challenges for achieving the fine-grained geolocalisation derived from exhaustive experiments over a ground truth of geotagged tweets gathered from two different cities. Additionally, we demonstrate its effectiveness in a traffic incident detection task by geolocalising new incident-related tweets using our fine-grained geolocalisation approaches.
【 预 览 】
附件列表
Files
Size
Format
View
Inferring the geolocation of tweets at a fine-grained level