学位论文详细信息
Outlier detection for information networks
outlier detection;Community Distribution\rOutliers (CDOutliers);Evolutionary Community Outliers (ECOutliers);toread;data mining;outlier detection for graphs;outlier detection for networks;graph query processing;community detection;community outliers;anomalies;anomaly detection;evolution in networks
Gupta, Manish
关键词: outlier detection;    Community Distribution\rOutliers (CDOutliers);    Evolutionary Community Outliers (ECOutliers);    toread;    data mining;    outlier detection for graphs;    outlier detection for networks;    graph query processing;    community detection;    community outliers;    anomalies;    anomaly detection;    evolution in networks;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/44770/Manish_Gupta.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

The study of networks has emerged in diverse disciplines as a means of analyzing complex relationship data. There has been a significant amount of work in network science which studies properties of networks, querying over networks, link analysis, influence propagation, network optimization, and many other forms of network analysis. Only recently has there been some work in the area of outlier detection for information network data. Outlier (or anomaly) detection is a very broad field and has been studied in the context of a large number of application domains. Many algorithms have been proposed for outlier detection in high-dimensional data, uncertain data, stream data and time series data. By its inherent nature, network data provides very different challenges that need to be addressed in a special way. Network data is gigantic, contains nodes of different types, rich nodes with associated attribute data, noisy attribute data, noisy link data, and is dynamically evolving in multiple ways. This thesis focuses on outlier detection for such networks with respect to two interesting perspectives: (1) community based outliers and (2) query based outliers.For community based outliers, we discuss the problem in both static as well as dynamic settings. Usually objects in a network form multiple communities. Most of the objects follow some popular community distribution patterns and also follow similar patterns of evolution. In a static network setting, one may find some objects each of whose community distribution does not follow any of the popular community distribution patterns. We refer to such outliers as Community Distribution Outliers (CDOutliers). The major challenge lies in extracting community patterns for various object types in an integrated way. We follow an iterative two-stage approach to identify CDOutliers, which performs pattern discovery (based on a joint non-negative matrix factorization approach) and outlier detection in a tightly integrated manner. In the dynamic setting, there are some objects which evolve in a very different way relative to other community members, and we define such objects as temporal community outliers. One of our studies is related to finding such outliers given two snapshots of a network (Evolutionary Community Outliers (ECOutliers)) while the other study is more general and focuses on a setting of multiple network snapshots (Community Trend Outliers (CTOutliers)). In both the studies, temporal patterns are discovered in an outlier-aware manner, and then outliers are discovered based on such patterns. The major challenge lies in performing cluster matching across snapshots so as to obtain temporal patterns. Another challenge is to define the outlier score once the patterns have been discovered. We propose algorithms and demonstrate the effectiveness of our algorithms in finding such outliers using both synthetic and real datasets.The other important aspect of my research is query based outlier detection. Given a heterogeneous network and a user subgraph query, the aim is to allow the user to find top-K ranked matches for the query in the network. Matches are ranked based on the rare and surprising associations (expressed as edges) within them. My work focuses on outlier detection with respect to two main kinds of queries: clique queries and general subgraph queries. This work can be useful for many critical applications like finding the most impacted subgraphs in computer networks during network attacks, finding the most critical locations to send extra aid on battlefields, etc. For the clique queries, we propose a low-cost shared neighbors index to assist subgraph matching and then propose a stricter version of negative association strength as a measure of outlierness of an association. The cliques are ranked based on the sum of outlierness of their edges and the top few are returned as Association Based Clique Outliers (ABCOutliers). For general subgraph queries, we discover top-K outliers by performing matching and outlier scoring in an integrated way. We pre-compute a graph topology index and a graph maximum metapath weight index. We exploit these data structures to propose a novel top-K algorithm for fast computation of subgraph outliers.The proposed concept of outlier detection from networks opens up a new direction of outlier detection research. The detected outliers, which cannotbe found by traditional outlier detection techniques, provide new insights into the application area. The algorithms we developed can be applied to many areas, including social network analysis, cyber-security, distributed systems, health care, and bio-informatics. As both the amount of data as well as the linkage increase in a variety of domains, such network-based techniques will find more applications and more opportunities for research for various settings.

【 预 览 】
附件列表
Files Size Format View
Outlier detection for information networks 4636KB PDF download
  文献评价指标  
  下载次数:12次 浏览次数:20次