Document-level Scientific Paper Translation sub-task



Traditional ASPEC translation tasks are sentence-level and the translation quality of them seem to be saturated. We think it's high time to move on to document-level evaluation. For the first year, we use ParaNatCom (Parallel English-Japanese abstract corpus made from Nature Communications articles) for the development and test sets of the Document-level Scientific Paper Translation sub-task. We cannot provide document-level training corpus, but you can use ASPEC and any other extra resources.


This year we have only English-to-Japanese translation direction. We have split the ParaNatCom into development and test sets. The file list of the development set is here and that of the test set is here. The participants of this task need to translate all the lines in the test set files under abstracts directory (which contains English sentences) of ParaNatCom and submit the translations.

There are reference (Japanese) translations under abstracts-ja-1 and abstracts-ja-2 directories. You can use those of the development set for tuning your system, but you should not look at those for the test set.

Note that each file is composed of 3 lines:

  1. title of the paper
  2. empty line
  3. abstract of the paper in one line


Each test file must be translated in the same format to the original file, which means that each translated file must contain exactly 3 lines:

  1. translated title of the paper
  2. empty line
  3. translated abstract of the paper in one line
All the translated files must be compressed together in one tar.gz format file and be submitted to "wat-organizer -at- googlegroups -dot- com"

Please note that you need to make a registration to WAT2020 before submitting your translation results.


Sampled files of the test set will be human-evaluated. The evaluation criteria will be announced later.


For general questions, comments, etc. please email to "wat-organizer -at- googlegroups -dot- com". For questions related to this task contact "nakazawa -at- logos -dot- t -dot- u-tokyo -dot- ac -dot- jp".

Last Modified: 2020-08-10