There are many discourse phenomena in machine translation that require inter-sentential context for disambiguation. I will start by giving an overview of some difficult discourse phenomena that require the MT system to look beyond the current sentence to generate correct translations. Then, I will discuss methods that we have recently proposed to understand and improve context usage in such translation settings. With regards to understanding context usage, I will introduce two methods and discuss the insights derived therefrom. The first is a new metric, conditional cross-mutual information, that uses the difference between model log probabilities with and without context to quantify the usage of context by context-aware MT models. I will then introduce a method for eliciting "gold-standard" contexts from human translators, and SCAT, a new English-French dataset comprising supporting context words for 14K translations that professional translators found useful for pronoun disambiguation. With regards to improving context usage, I will introduce two methods inspired by our gained understanding. The first is context-aware word dropout, is a simple method that drops out words from the current sentence to increase the reliance of predictions on context. The second, is a guided attention strategy to encourage agreement between model- and human-annotated important context. We demonstrate how both methods increase translation accuracy, particularly with respect to discourse phenomena where the context is salient.