学位论文详细信息
Information-theoretic bounds in learning algorithms
active and adaptive learning;model change detection;mutual information based generalization bounds;model compression
Bu, Yuheng
关键词: active and adaptive learning;    model change detection;    mutual information based generalization bounds;    model compression;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/106140/BU-DISSERTATION-2019.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】
The focus of this thesis is on understanding machine learning algorithms from an information-theoretic point of view. More specifically, we apply information-theoretic tools to construct performance bounds for the learning algorithms, with the goal of deepening the understanding of current algorithms and inspiring new learning techniques.The first problem considered involves a sequence of machine learning problems that vary in a bounded manner from one time-step to the next. To solve these problems in an accurate and data-efficient way, an active and adaptive learning framework is proposed, in which the labels of the most informative samples are actively queried from an unlabeled data pool, and the adaptation to the change is achieved by utilizing the information acquired in previous steps. The goal is to satisfy a pre-specified bound on the excess risk at each time-step. More specifically, the design of the active querying algorithm is based on minimizing the excess risk using stochastic gradient descent in the maximum likelihood estimation setting. Our algorithm and theoretical results are validated by experiments with synthetic and real data. To determine whether the active and adaptive learning framework is applicable in practice, we then study the problem of model change detection. There are two sets of samples that are generated according to a pre-change probabilistic model with parameter theta, and a post-change model with parameter theta', respectively. The goal is to detect whether the change in the model is significant. We construct an empirical difference test (EDT), which has low computational complexity. Moreover, we provide an approximation method to set the threshold of the EDT to meet the false alarm constraint. Experiments with linear regression and logistic regression are conducted to validate the proposed algorithms.Another key contribution of this thesis is in the area of mutual information-basedgeneralization error bounds of supervised learning algorithms. Our bound is constructed in terms of the mutual information between each individual training sample and the output of the learning algorithm, which requires weaker conditions on the loss function, and provides a tighter characterization of the generalization error than existing studies. Examples are further provided to demonstrate that the proposed bound is tighter and has a broader range of applicability. Finally, an application of this mutual information-based generalization error bound is considered. We show that model compression can improve the population risk of a pre-trained model, by studying the tradeoff between the decrease in the generalization error and the increase in the empirical risk with model compression. We first prove that model compression reduces the mutual information-based generalization error bound; this allows for an interpretation of model compression as a regularization technique to avoid overfitting. We then characterize the increase in empirical risk with model compression using rate distortion theory. We show through a linear regression example that such an improvement in population risk due to model compression is indeed possible. Our theoretical results further suggest that the Hessian-weighted K-means clustering compression approach can be improved by regularizing the distance between the clustering centers. We provide experiments with neural networks to support our theoretical assertions.
【 预 览 】
附件列表
Files Size Format View
Information-theoretic bounds in learning algorithms 1225KB PDF download
  文献评价指标  
  下载次数:2次 浏览次数:4次