In this thesis, we analyze failure cases of state-of-the-art detectors and observe that most hard false positives result from classification instead of localization and they have a large negative impact on the performance of object detectors. We conjecture three factors that lie behind hard false positives, and we confirm the conjecture with experiments that prove the following: (1) Shared feature representation is not optimal due to the mismatched goals of feature learning for classification and localization; (2) large receptive field for different scales leads to redundant context information for small objects; (3) multi-task learning helps, yet optimization of the multi-task loss may prove sub-optimal for individual tasks. We demonstrate the potential power of detector classification by a simple, effective, and widely applicable Decoupled Classification Refinement (DCR) network. In particular, DCR places a separate classification network in parallel with the localization network (base detector). With ROI pooling placed on the early stage of the classification network, we enforce an adaptive receptive field in DCR. During training, DCR samples hard false positives from the base detector and trains a strong classifier to refine classification results. During testing, DCR refines all boxes from the base detector. Experiments show competitive results on PASCAL VOC and COCO without any bells and whistles. Our codes are available at: https://github.com/bowenc0221/Decoupled-Classification-Refinement.