WAT

The Workshop on Asian Translation

Evaluation Results

BLEU

#	Team	Task	Date/Time	DataID	BLEU										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	juman	kytea	mecab	moses- tokenizer	stanford- segmenter- ctb	stanford- segmenter- pku	indic- tokenizer	unuse	myseg	kmseg	Method	Other Resources	System Description
1	IIIT-H	INDIC21en-or	2021/05/03 18:10:23	6011	-	-	-	-	-	-	20.15	-	-	-	NMT	No	MNMT system (En-XX) trained via exploiting lexical similarity on PMI+CVIT parallel corpus, then improved using back translation on PMI monolingual data followed by fine tuning.
2	SRPOL	INDIC21en-or	2021/05/04 15:21:35	6238	-	-	-	-	-	-	19.94	-	-	-	NMT	No	Ensemble of one-to-many on all data. Pretrained on BT, finetuned on PMI
3	NICT-5	INDIC21en-or	2021/06/25 11:38:49	6489	-	-	-	-	-	-	19.20	-	-	-	NMT	No	Using PMI and PIB data for fine-tuning on a mbart model trained for over 5 epochs. MNMT model.
4	SRPOL	INDIC21en-or	2021/05/04 16:27:26	6264	-	-	-	-	-	-	19.15	-	-	-	NMT	No	One-to-many on all data. Pretrained on BT, finetuned on PMI
5	CFILT	INDIC21en-or	2021/05/04 01:05:15	6048	-	-	-	-	-	-	18.22	-	-	-	NMT	No	Multilingual(One-to-Many(En-XX)) NMT model based on Transformer with shared encoder and decoder.
6	sakura	INDIC21en-or	2021/05/04 04:13:42	6157	-	-	-	-	-	-	17.88	-	-	-	NMT	No	Pre-training multilingual mBART one2many model with training corpus followed by finetuning on PMI Parallel.
7	sakura	INDIC21en-or	2021/05/01 11:34:25	5888	-	-	-	-	-	-	17.81	-	-	-	NMT	No	Fine-tuning of multilingual mBART one2many model with training corpus.
8	mcairt	INDIC21en-or	2021/05/03 17:20:41	5996	-	-	-	-	-	-	17.71	-	-	-	NMT	No	multilingual model(one to many model) trained on all WAT 2021 data by using base transformer.
9	NICT-5	INDIC21en-or	2021/04/22 11:53:17	5360	-	-	-	-	-	-	16.69	-	-	-	NMT	No	MBART+MNMT. Beam 4.
10	coastal	INDIC21en-or	2021/05/04 01:39:21	6084	-	-	-	-	-	-	15.66	-	-	-	NMT	No	seq2seq model trained on all WAT2021 data
11	NICT-5	INDIC21en-or	2021/04/21 15:44:24	5285	-	-	-	-	-	-	15.01	-	-	-	NMT	No	Pretrain MBART on IndicCorp and FT on bilingual PMI data. Beam search. Model is bilingual.
12	NLPHut	INDIC21en-or	2021/03/19 16:29:30	4596	-	-	-	-	-	-	12.81	-	-	-	NMT	No	Transformer with target language tag trained using all languages PMI data. Then fine-tuned using all en-or data.
13	IITP-MT	INDIC21en-or	2021/05/04 18:05:32	6293	-	-	-	-	-	-	12.57	-	-	-	NMT	No	One-to-Many model trained on all training data with base Transformer. All indic language data is romanized. Model fine-tuned on BT PMI monolingual corpus.
14	SRPOL	INDIC21en-or	2021/04/21 19:21:19	5321	-	-	-	-	-	-	12.33	-	-	-	SMT	No	Base transformer on all WAT21 data
15	ORGANIZER	INDIC21en-or	2021/04/08 17:24:28	4800	-	-	-	-	-	-	9.08	-	-	-	NMT	No	Bilingual baseline trained on PMI data. Transformer base. LR=10-3
16	gaurvar	INDIC21en-or	2021/05/01 19:32:30	5932	-	-	-	-	-	-	2.60	-	-	-	NMT	No
17	gaurvar	INDIC21en-or	2021/04/25 20:00:38	5584	-	-	-	-	-	-	2.20	-	-	-	NMT	No	Multi Task Multi Lingual T5 trained for Multiple Indic Languages

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.

RIBES

#	Team	Task	Date/Time	DataID	RIBES										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	juman	kytea	mecab	moses- tokenizer	stanford- segmenter- ctb	stanford- segmenter- pku	indic- tokenizer	unuse	myseg	kmseg	Method	Other Resources	System Description
1	SRPOL	INDIC21en-or	2021/05/04 15:21:35	6238	-	-	-	-	-	-	0.751086	-	-	-	NMT	No	Ensemble of one-to-many on all data. Pretrained on BT, finetuned on PMI
2	IIIT-H	INDIC21en-or	2021/05/03 18:10:23	6011	-	-	-	-	-	-	0.750260	-	-	-	NMT	No	MNMT system (En-XX) trained via exploiting lexical similarity on PMI+CVIT parallel corpus, then improved using back translation on PMI monolingual data followed by fine tuning.
3	SRPOL	INDIC21en-or	2021/05/04 16:27:26	6264	-	-	-	-	-	-	0.749740	-	-	-	NMT	No	One-to-many on all data. Pretrained on BT, finetuned on PMI
4	NICT-5	INDIC21en-or	2021/06/25 11:38:49	6489	-	-	-	-	-	-	0.747217	-	-	-	NMT	No	Using PMI and PIB data for fine-tuning on a mbart model trained for over 5 epochs. MNMT model.
5	mcairt	INDIC21en-or	2021/05/03 17:20:41	5996	-	-	-	-	-	-	0.743984	-	-	-	NMT	No	multilingual model(one to many model) trained on all WAT 2021 data by using base transformer.
6	sakura	INDIC21en-or	2021/05/01 11:34:25	5888	-	-	-	-	-	-	0.741763	-	-	-	NMT	No	Fine-tuning of multilingual mBART one2many model with training corpus.
7	sakura	INDIC21en-or	2021/05/04 04:13:42	6157	-	-	-	-	-	-	0.740263	-	-	-	NMT	No	Pre-training multilingual mBART one2many model with training corpus followed by finetuning on PMI Parallel.
8	CFILT	INDIC21en-or	2021/05/04 01:05:15	6048	-	-	-	-	-	-	0.738397	-	-	-	NMT	No	Multilingual(One-to-Many(En-XX)) NMT model based on Transformer with shared encoder and decoder.
9	NICT-5	INDIC21en-or	2021/04/22 11:53:17	5360	-	-	-	-	-	-	0.734028	-	-	-	NMT	No	MBART+MNMT. Beam 4.
10	coastal	INDIC21en-or	2021/05/04 01:39:21	6084	-	-	-	-	-	-	0.727477	-	-	-	NMT	No	seq2seq model trained on all WAT2021 data
11	NICT-5	INDIC21en-or	2021/04/21 15:44:24	5285	-	-	-	-	-	-	0.716665	-	-	-	NMT	No	Pretrain MBART on IndicCorp and FT on bilingual PMI data. Beam search. Model is bilingual.
12	IITP-MT	INDIC21en-or	2021/05/04 18:05:32	6293	-	-	-	-	-	-	0.714731	-	-	-	NMT	No	One-to-Many model trained on all training data with base Transformer. All indic language data is romanized. Model fine-tuned on BT PMI monolingual corpus.
13	SRPOL	INDIC21en-or	2021/04/21 19:21:19	5321	-	-	-	-	-	-	0.714550	-	-	-	SMT	No	Base transformer on all WAT21 data
14	NLPHut	INDIC21en-or	2021/03/19 16:29:30	4596	-	-	-	-	-	-	0.693696	-	-	-	NMT	No	Transformer with target language tag trained using all languages PMI data. Then fine-tuned using all en-or data.
15	ORGANIZER	INDIC21en-or	2021/04/08 17:24:28	4800	-	-	-	-	-	-	0.638520	-	-	-	NMT	No	Bilingual baseline trained on PMI data. Transformer base. LR=10-3
16	gaurvar	INDIC21en-or	2021/05/01 19:32:30	5932	-	-	-	-	-	-	0.431373	-	-	-	NMT	No
17	gaurvar	INDIC21en-or	2021/04/25 20:00:38	5584	-	-	-	-	-	-	0.380253	-	-	-	NMT	No	Multi Task Multi Lingual T5 trained for Multiple Indic Languages

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.

AMFM

#	Team	Task	Date/Time	DataID	AMFM										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	Method	Other Resources	System Description
1	SRPOL	INDIC21en-or	2021/05/04 15:21:35	6238	-	-	-	-	-	-	0.771831	-	-	-	NMT	No	Ensemble of one-to-many on all data. Pretrained on BT, finetuned on PMI
2	SRPOL	INDIC21en-or	2021/05/04 16:27:26	6264	-	-	-	-	-	-	0.771493	-	-	-	NMT	No	One-to-many on all data. Pretrained on BT, finetuned on PMI
3	sakura	INDIC21en-or	2021/05/04 04:13:42	6157	-	-	-	-	-	-	0.769884	-	-	-	NMT	No	Pre-training multilingual mBART one2many model with training corpus followed by finetuning on PMI Parallel.
4	CFILT	INDIC21en-or	2021/05/04 01:05:15	6048	-	-	-	-	-	-	0.768399	-	-	-	NMT	No	Multilingual(One-to-Many(En-XX)) NMT model based on Transformer with shared encoder and decoder.
5	sakura	INDIC21en-or	2021/05/01 11:34:25	5888	-	-	-	-	-	-	0.767385	-	-	-	NMT	No	Fine-tuning of multilingual mBART one2many model with training corpus.
6	mcairt	INDIC21en-or	2021/05/03 17:20:41	5996	-	-	-	-	-	-	0.763064	-	-	-	NMT	No	multilingual model(one to many model) trained on all WAT 2021 data by using base transformer.
7	coastal	INDIC21en-or	2021/05/04 01:39:21	6084	-	-	-	-	-	-	0.758199	-	-	-	NMT	No	seq2seq model trained on all WAT2021 data
8	NICT-5	INDIC21en-or	2021/04/22 11:53:17	5360	-	-	-	-	-	-	0.757804	-	-	-	NMT	No	MBART+MNMT. Beam 4.
9	NICT-5	INDIC21en-or	2021/04/21 15:44:24	5285	-	-	-	-	-	-	0.748319	-	-	-	NMT	No	Pretrain MBART on IndicCorp and FT on bilingual PMI data. Beam search. Model is bilingual.
10	IITP-MT	INDIC21en-or	2021/05/04 18:05:32	6293	-	-	-	-	-	-	0.737576	-	-	-	NMT	No	One-to-Many model trained on all training data with base Transformer. All indic language data is romanized. Model fine-tuned on BT PMI monolingual corpus.
11	NLPHut	INDIC21en-or	2021/03/19 16:29:30	4596	-	-	-	-	-	-	0.736638	-	-	-	NMT	No	Transformer with target language tag trained using all languages PMI data. Then fine-tuned using all en-or data.
12	IIIT-H	INDIC21en-or	2021/05/03 18:10:23	6011	-	-	-	-	-	-	0.735718	-	-	-	NMT	No	MNMT system (En-XX) trained via exploiting lexical similarity on PMI+CVIT parallel corpus, then improved using back translation on PMI monolingual data followed by fine tuning.
13	SRPOL	INDIC21en-or	2021/04/21 19:21:19	5321	-	-	-	-	-	-	0.723507	-	-	-	SMT	No	Base transformer on all WAT21 data
14	ORGANIZER	INDIC21en-or	2021/04/08 17:24:28	4800	-	-	-	-	-	-	0.714530	-	-	-	NMT	No	Bilingual baseline trained on PMI data. Transformer base. LR=10-3
15	gaurvar	INDIC21en-or	2021/05/01 19:32:30	5932	-	-	-	-	-	-	0.611704	-	-	-	NMT	No
16	gaurvar	INDIC21en-or	2021/04/25 20:00:38	5584	-	-	-	-	-	-	0.591864	-	-	-	NMT	No	Multi Task Multi Lingual T5 trained for Multiple Indic Languages
17	NICT-5	INDIC21en-or	2021/06/25 11:38:49	6489	-	-	-	-	-	-	0.000000	-	-	-	NMT	No	Using PMI and PIB data for fine-tuning on a mbart model trained for over 5 epochs. MNMT model.

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.
Adequacy-Fluency Metrics (AMFM) is a two-dimensional automatic evaluation metric for machine translation, designed to operate at the sentence level. It is based on adequacy and fluency, to decouple semantic and syntactic components of the translation process to provide a balanced view on translation quality.
AMFM is calculated without tokenizers.
The detail of AMFM is shown on the following paper: "Adequacy–Fluency Metrics: Evaluating MT in the Continuous Space Model Framework" [pdf]. Invited Talk in WAT2015 also helps understanding [slide].

HUMAN (WAT2022)

Notice:

HUMAN (WAT2022) is the result of the Pairwise Crowdsourcing Evaluation on WAT2022.
HUMAN (WAT2022) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2021)

#	Team	Task	Date/Time	DataID	HUMAN	Method	Other Resources	System Description
1	NLPHut	INDIC21en-or	2021/03/19 16:29:30	4596	Underway	NMT	No	Transformer with target language tag trained using all languages PMI data. Then fine-tuned using all en-or data.
2	NICT-5	INDIC21en-or	2021/04/21 15:44:24	5285	Underway	NMT	No	Pretrain MBART on IndicCorp and FT on bilingual PMI data. Beam search. Model is bilingual.
3	NICT-5	INDIC21en-or	2021/04/22 11:53:17	5360	Underway	NMT	No	MBART+MNMT. Beam 4.
4	gaurvar	INDIC21en-or	2021/04/25 20:00:38	5584	Underway	NMT	No	Multi Task Multi Lingual T5 trained for Multiple Indic Languages
5	gaurvar	INDIC21en-or	2021/05/01 19:32:30	5932	Underway	NMT	No
6	mcairt	INDIC21en-or	2021/05/03 17:20:41	5996	Underway	NMT	No	multilingual model(one to many model) trained on all WAT 2021 data by using base transformer.
7	IIIT-H	INDIC21en-or	2021/05/03 18:10:23	6011	Underway	NMT	No	MNMT system (En-XX) trained via exploiting lexical similarity on PMI+CVIT parallel corpus, then improved using back translation on PMI monolingual data followed by fine tuning.
8	CFILT	INDIC21en-or	2021/05/04 01:05:15	6048	Underway	NMT	No	Multilingual(One-to-Many(En-XX)) NMT model based on Transformer with shared encoder and decoder.
9	coastal	INDIC21en-or	2021/05/04 01:39:21	6084	Underway	NMT	No	seq2seq model trained on all WAT2021 data
10	sakura	INDIC21en-or	2021/05/04 04:13:42	6157	Underway	NMT	No	Pre-training multilingual mBART one2many model with training corpus followed by finetuning on PMI Parallel.
11	SRPOL	INDIC21en-or	2021/05/04 15:21:35	6238	Underway	NMT	No	Ensemble of one-to-many on all data. Pretrained on BT, finetuned on PMI
12	SRPOL	INDIC21en-or	2021/05/04 16:27:26	6264	Underway	NMT	No	One-to-many on all data. Pretrained on BT, finetuned on PMI
13	IITP-MT	INDIC21en-or	2021/05/04 18:05:32	6293	Underway	NMT	No	One-to-Many model trained on all training data with base Transformer. All indic language data is romanized. Model fine-tuned on BT PMI monolingual corpus.

Notice:

HUMAN (WAT2021) is the result of the Pairwise Crowdsourcing Evaluation on WAT2021.
HUMAN (WAT2021) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2020)

Notice:

HUMAN (WAT2020) is the result of the Pairwise Crowdsourcing Evaluation on WAT2020.
HUMAN (WAT2020) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2019)

Notice:

HUMAN (WAT2019) is the result of the Pairwise Crowdsourcing Evaluation on WAT2019.
HUMAN (WAT2019) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2018)

Notice:

HUMAN (WAT2018) is the result of the Pairwise Crowdsourcing Evaluation on WAT2018.
HUMAN (WAT2018) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2017)

Notice:

HUMAN (WAT2017) is the result of the Pairwise Crowdsourcing Evaluation on WAT2017.
HUMAN (WAT2017) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2016)

Notice:

HUMAN (WAT2016) is the result of the Pairwise Crowdsourcing Evaluation on WAT2016.
HUMAN (WAT2016) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2015)

Notice:

HUMAN (WAT2015) is the result of the Pairwise Crowdsourcing Evaluation on WAT2015.
HUMAN (WAT2015) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.
The detail of the evaluation can be found in the pdf document (PDF file).

HUMAN (WAT2014)

Notice:

HUMAN (WAT2014) is the result of the Pairwise Crowdsourcing Evaluation on WAT2014.
HUMAN (WAT2014) was evaluated by 3 different workers and the final decision is made by the voting of the judgements.
The detail of the evaluation can be found in the pdf document (PDF file).

EVALUATION RESULTS USAGE POLICY

When you use the WAT evaluation results for any purpose such as:
- writing technical papers,
- making presentations about your system,
- advertising your MT system to the customers,
you can use the information about translation directions, scores (including both automatic and human evaluations) and ranks of your system among others. You can also use the scores of the other systems, but you MUST anonymize the other system's names. In addition, you can show the links (URLs) to the WAT evaluation result pages.

NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2018-08-02

WAT The Workshop on Asian Translation Evaluation Results

BLEU

RIBES

AMFM

HUMAN (WAT2022)

HUMAN (WAT2021)

HUMAN (WAT2020)

HUMAN (WAT2019)

HUMAN (WAT2018)

HUMAN (WAT2017)

HUMAN (WAT2016)

HUMAN (WAT2015)

HUMAN (WAT2014)

EVALUATION RESULTS USAGE POLICY

WAT

The Workshop on Asian Translation

Evaluation Results