学位论文

【摘要】

We are living in the Internet Age, in which information entities and objects are interconnected, thereby forming gigantic information networks. These networks are not only massive, but also grow and evolve very quickly. It is critical to quickly process and understand these networks in order to enable data-driven applications. On the other hand, the labels of the nodes in big networks are scarce.It is urgent tooptimize the process by which the labels are collected, because it is unrealistic to get labels of every node. The objective of my research is to develop algorithms for big network analytics, which are both statistically and computationally efficient, and with provable guarantee on their performance. In particular, I present active learning, online learning, selective sampling (online active learning), and online learning with bandit feedback algorithms for learning in a network.In the first part of this thesis, I propose a \textit{nonadaptive} active learning approach on a graph, based on generalization error bound minimization. In particular, I present a data-dependent error bound for a graph-based learning method, namely learning with local and global consistency (LLGC). I show that the empirical transductive Rademacher complexity of the function class for LLGC provides a natural criterion for active learning. The resulting active learning approach is to select a subset of nodes on a graph such that the empirical transductive Rademacher complexity of LLGC is minimized. I propose a simple yet effective sequential optimization algorithm to solve it.In the second part of this thesis, I first present an online learning algorithm on a graph for binary node classification. It is an online version ofthe well-known Learning with Local and Global Consistency method(OLLGC). It is essentially a second-order online learning algorithm,and can be seen as an online ridge regression in the Hilbert spaceof functions defined on a graph. I prove its regret bound in termsof the structural property (cut size) of a graph. Based on OLLGC, I present a selectivesampling algorithm, namely Selective Sampling with Local and GlobalConsistency (SSLGC), which adaptively queries the label of each node based on theconfidence of the linear function on a graph. Its regret bound as well as its labelcomplexity are also derived. I also analyze the low-rank approximation of graph kernels, which enables the online algorithms scale to large graphs. In the third part of this thesis, I first extend the online binary classification algorithm to multi-class regime, based on spectral learning technique. The resulting algorithm is called online spectral learning on a graph (OSLG). Then I study online learning on a graph with bandit feedback, where after the learner makes a prediction of the node label, the oracle will return a single bit indicating whether the prediction is correct or not. I propose an algorithm namely online spectral learning on a graph with bandit feedback (OSLG Bandit), which uses upper-confidence bound of the nonparametric classifier on a graph to trade off the exploration and exploitation of label information. I derive regret bounds for both algorithms, which clearly characterize the difficulty of online learning on a graph for multi-class node classification, both in the full information setting and in the partial information setting. All in all, my efforts in this thesis have provided important findings in these areas, and have yielded new algorithms and theory.

【预览】

附件列表
Files	Size	Format	View
Online and active learning of big networks: theory and algorithms	1220KB	PDF	download


Online and active learning of big networks: theory and algorithms
Big Networks;Online Learning;Active Learning
Gu, Quanquan
关键词: Big Networks; Online Learning; Active Learning;
Others : https://www.ideals.illinois.edu/bitstream/handle/2142/50757/Quanquan_Gu.pdf?sequence=1&isAllowed=y
美国\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：10次	浏览次数：95次

【 摘 要 】

【 预 览 】

【摘要】

【预览】