WAT 2020

The 7th Workshop on Asian Translation

Baseline Systems

Tools for NMT

[BASELINE SYSTEMS TOP] | [FOR ALL SUBTASKS] | [FOR JAPANESE SEGMENTATION] | [FOR CHINESE SEGMENTATION] | [FOR KOREAN SEGMENTATION] | [FOR HINDI SEGMENTATION] | [FOR CHARACTER CONVERSION] | [FOR EXTRACTING TEXTS FROM RECIPE JSON FILES] | [JAVA]

For all subtasks

OpenNMT version 0.9.7
BPE (except for my-en and km-en)
BPE without codec (for my-en and km-en)

For English tokenization

Moses version 4.0 or Moses version 2.1.1

For Japanese segmentation

Juman version 7.0

For Chinese segmentation

Stanford Word Segmenter version 2014-01-04
(Using Chinese Penn Treebank (CTB) model)

For Korean segmentation

mecab-ko
(https://bitbucket.org/eunjeon/mecab-ko/downloads/mecab-0.996-ko-0.9.1.tar.gz)
(https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-1.6.1-20140814.tar.gz)
Install procedure

For Hindi segmentation

Indic NLP Library: For tokenization, normalization and script conversion

For character conversion

scripts

For extracting texts from recipe JSON files

scripts

Java

Version 7 Update 60 (JDK, JRE)
(For Stanford Word Segmenter)

NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2020-07-08