Document-level Business Scene Dialogue Translation sub-task



There are a lot of ready-to-use parallel corpora for training machine translation systems, however, most of them are in written languages such as web crawl, news-commentary, patents, scientific papers and so on. Even though some of the parallel corpora are in spoken language, they are mostly spoken by only one person (TED talks) or contain a lot of noise (OpenSubtitle). Most of other MT evaluation campaigns adopt the written language, monologue or noisy dialogue parallel corpora for their translation tasks. Traditional ASPEC translation tasks are sentence-level and the translation quality of them seem to be saturated. We think it's high time to move on to document-level evaluation. For the first year, WAT uses BSD Corpus (The Business Scene Dialogue corpus) for the dataset including training, development and test data. Participants of this taks must get a copy of BSD corpus by themselves.


The participants of this task need to translate all the sentences in the test.json file and submit the translations. For the English-to-Japanese translation, all the "en_sentence" need to be translated into Japanese, and vice versa.


All the translated sentences must be contained in one text file with the following conditions:

The translations must be submitted to the automatic evaluation server.

Please note that you need to apply for WAT 2021 before submitting your translation results.


Automatic evaluation will be conducted by the automatic evaluation server. Sampled scenarios of the test data will be human-evaluated. The evaluation criteria will be announced later.


For general questions, comments, etc. please email to "wat-organizer -at- googlegroups -dot- com". For questions related to this task contact "nakazawa -at- logos -dot- t -dot- u-tokyo -dot- ac -dot- jp".

Last Modified: 2020-12-21