In this thesis I introduce visual phrases, complex visual composites like ``a person riding a horse''.Visual phrases often display significantly reducedvisual complexity compared to their component objects, because the appearance of those objects can change profoundly when they participate in relations. I introduce a dataset suitable for phrasal recognition that uses familiar PASCAL object categories, and demonstrate significant experimental gains resulting from exploiting visual phrases.I show that a visual phrase detector significantly outperforms a baseline which detects component objects and reasons about relations, even though visual phrase training sets tend to be smaller than those for objects.I argue that any multi-class detection system must decode detector outputs to produce final results; this is usually done with non-maximum suppression.I describe a novel decoding procedure that can account accurately for local context without solving difficult inference problems.I show this decoding procedure outperforms the state of the art.Finally, I show that decoding a combination of phrasal and object detectors produces real improvements in detector results.