Over the past few years, there has been a lot of progress being made in machine translation through deep learning networks. But there is relatively lesser progress made in using images to catalyze the translation tasks. In this study, we explore various models to incorporate the image features in the machine translation models. We start with a monomodal translation model which uses only textual features. We extend this model to develop the multimodal system which incorporates the visual features related to the source sentence. We also propose a multitask system which uses image captioning task to aid the translation task. Our models are tested on multiple datasets using the automatic evaluation metrics like METEOR and BLEU. The experiments show that the proposed models outperform the text-only baseline model.