学位论文详细信息
Algorithms on graph-structured data with imperfect information
Graph data;Imperfect information;Efficient inference
Zhou, Huozhi ; Varshney ; Lav R.
关键词: Graph data;    Imperfect information;    Efficient inference;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/105609/ZHOU-THESIS-2019.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】
Graph-structured data is able to characterize pairwise or even higher-order relations among different data points, and has been demonstrated to be highly advantageous in various data mining and machine learning applications. Such graph-structured data may either come from real life networks, or some transformation based on data points. However, in practice the measurement of graph-structured data is usually partially incomplete or incorrect. For example, the measured states of nodes in the graph might be incorrect due to sensor noise. In this thesis, we study two problems on graph-structured data with imperfect information: hypergraph-based active learning and source estimation on directed acyclic graphs (DAGs), all with provable statistical guarantees.In the first part of this thesis, we propose an active learning scheme which is able to accommodate the structure of hypergraphs, termed HS2. HS2 generalizes the previously proposed S2 algorithm which is only able to solve graph-based active learning (GAL) with pointwise oracle. Our HS2 is more flexible in the sense that it is adaptable for three different types of oracles: pointwise oracle, pairwise oracle, as well as noisy pairwise oracle. Based on a novel parametric system particularly designed for hypergraphs, we derive theoretical results on the query complexity of HS2 for the above described settings. Both the theoretical and empirical results show that HS2 outperforms the naive combination of clique expansion and GAL algorithms.Next we develop a heuristic, termed generalized Jordan center (GJC), to estimate the source of a spreading process on a DAG based on noisy and incomplete observations. This problem is motivated by contamination diffusion in a food supply chain. For this setting, identifying the source correctly and efficiently as well as inferring states of unobserved events are of top priorities (the recall problem). We believe this is the first work on source estimation with noisy information. Under mild conditions, GJC is the maximum likelihood (ML) estimator of the diffusion source. Our proposed heuristic is parameter-free (only needs to know the structure of the DAG and states of some nodes), and can be evaluated efficiently by a message-passinglike algorithm in ~O (jV j) complexity (the tilde notation means ignoring the logarithm factor), where V is the vertex set. Experiments on both synthetic and real networks show that GJC has significant gains over a naive extension of Jordan center and is comparable to the exact ML estimate, in terms of source detection probability and false negative rate for recall.
【 预 览 】
附件列表
Files Size Format View
Algorithms on graph-structured data with imperfect information 780KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:20次