The world is full of tiny but useful objects such as the door handle of a car or the light switch in a room. Such objects are barely visible in an image and can be well approximated by a single point. We refer to these small objects as landmarks in addition to the more common usage of the term to refer to the anatomical or facial landmarks. Landmark localization refers to the detection of one or more such landmarks in an image. In this dissertation, we describe methods for localizing such landmarks in images. Automatically localizing these landmarks in images is hard as they usually don’t have a distinctive appearance of their own. They are largely defined by their context. Absence of local appearance necessitates effective modeling of context to achieve good localization performance. This context can be explicit in the form of other landmarks that are spatially related or implicit in the form of certain recurring and informative patterns that may not have a semantic label. We describe methods that model both explicit and implicit context to improve landmark localization performance.Localization performance is tied to the underlying learning machinery being used. Deep neural networks have proven to be quite successful for computer vision applications and this dissertation employs them as the underlying learning machine. We describe a method that uses a deep neural network with residual architecture, a recently proposed architecture for classification, and improves its performance with a novel stochastic training method called Swapout.