Recent technological advances have facilitated the collection and distribution of a plethora of increasingly diverse and complex data.Supervised learning has been able to provide the toolbox of choice for exploiting it to study and model numerous natural and social phenomena. These learning techniques typically require substantial amounts of training data in order to induce good solutions.However, generating annotation often places a significant burden on human experts, and makes supervised learning methods costly to apply.On the other hand, data itself often provides hints sufficient to induce high quality supervision and utilizing these hints can be substantially less labor intensive than producing explicit annotation.This thesis introduces a framework we call Learning with Incidental Supervision, which formalizes these concepts.In particular, we show that various aspects of the data often contain cues capable of inducing weak supervision signals, which could in turn be aggregated to produce high quality annotation. We examine both the derivation of these signals and aggregation of their predictions in the context of concrete learning tasks, making independent contributions in both cases.We use the task of Named Entity Discovery to demonstrate that inherent properties of unsupervised multilingual data readily available online can be used to derive multiple weak supervision signals capable of inducing named entity annotation in a new language.We show that combining these signals can substantially improve the resulting annotation.Next, we introduce a general unsupervised learning framework for aggregating predictions from multiple weak supervision sources in order to induce high quality annotation.We exploit agreement between the signals to estimate their relative quality and learn an effective aggregation model.The mathematical and algorithmic aggregation framework can in principle be applied to combining arbitrary types of predictions, and has a large number of applications on its own. We instantiate it and demonstrate its effectiveness for combining permutations, top-k lists, and dependency parses.