WAT

The Workshop on Asian Translation

Evaluation Results

BLEU

#	Team	Task	Date/Time	DataID	BLEU										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	juman	kytea	mecab	moses- tokenizer	stanford- segmenter- ctb	stanford- segmenter- pku	indic- tokenizer	unuse	myseg	kmseg	Method	Other Resources	System Description
1	SRPOL	INDIC21en-te	2021/05/04 15:23:24	6241	-	-	-	-	-	-	16.85	-	-	-	NMT	No	Ensemble of one-to-many on all data. Pretrained on BT, finetuned on PMI
2	SRPOL	INDIC21en-te	2021/05/04 16:28:56	6267	-	-	-	-	-	-	16.82	-	-	-	NMT	No	One-to-many on all data. Pretrained on BT, finetuned on PMI
3	IIIT-H	INDIC21en-te	2021/05/03 18:11:37	6014	-	-	-	-	-	-	15.61	-	-	-	NMT	No	MNMT system (En-XX) trained via exploiting lexical similarity on PMI+CVIT parallel corpus, then improved using back translation on PMI monolingual data followed by fine tuning.
4	CFILT	INDIC21en-te	2021/05/04 01:07:21	6051	-	-	-	-	-	-	15.52	-	-	-	NMT	No	Multilingual(One-to-Many(En-XX)) NMT model based on Transformer with shared encoder and decoder.
5	sakura	INDIC21en-te	2021/05/04 04:19:16	6160	-	-	-	-	-	-	15.48	-	-	-	NMT	No	Pre-training multilingual mBART one2many model with training corpus followed by finetuning on PMI Parallel.
6	coastal	INDIC21en-te	2021/05/04 01:40:50	6088	-	-	-	-	-	-	12.86	-	-	-	NMT	No	seq2seq model trained on all WAT2021 data
7	sakura	INDIC21en-te	2021/05/01 11:39:52	5891	-	-	-	-	-	-	11.86	-	-	-	NMT	No	Fine-tuning of multilingual mBART one2many model with training corpus.
8	mcairt	INDIC21en-te	2021/05/03 17:25:18	5997	-	-	-	-	-	-	11.17	-	-	-	NMT	No	multilingual model(one to many model) trained on all WAT 2021 data by using base transformer.
9	SRPOL	INDIC21en-te	2021/04/21 19:34:10	5334	-	-	-	-	-	-	10.65	-	-	-	NMT	No	Base transformer on all WAT21 data
10	IITP-MT	INDIC21en-te	2021/05/04 18:18:33	6305	-	-	-	-	-	-	6.25	-	-	-	NMT	No	One-to-Many model trained on all training data with base Transformer. All indic language data is romanized. Model fine-tuned on BT PMI monolingual corpus.
11	NICT-5	INDIC21en-te	2021/06/25 11:39:47	6492	-	-	-	-	-	-	5.57	-	-	-	NMT	No	Using PMI and PIB data for fine-tuning on a mbart model trained for over 5 epochs. MNMT model.
12	NLPHut	INDIC21en-te	2021/05/03 00:19:16	5986	-	-	-	-	-	-	4.88	-	-	-	NMT	No	Transformer with source and target language tags trained using all languages PMI data. Then fine tuned using all en-te data.
13	NICT-5	INDIC21en-te	2021/04/21 15:46:17	5291	-	-	-	-	-	-	4.59	-	-	-	NMT	No	Pretrain MBART on IndicCorp and FT on bilingual PMI data. Beam search. Model is bilingual.
14	NICT-5	INDIC21en-te	2021/04/22 11:54:22	5366	-	-	-	-	-	-	4.20	-	-	-	NMT	No	MBART+MNMT. Beam 4.
15	NLPHut	INDIC21en-te	2021/03/20 00:20:23	4618	-	-	-	-	-	-	3.42	-	-	-	NMT	No	Transformer with target language tag trained using all languages PMI data. Then fine tuned using en-te PMI data.
16	ORGANIZER	INDIC21en-te	2021/04/08 17:26:14	4806	-	-	-	-	-	-	2.80	-	-	-	NMT	No	Bilingual baseline trained on PMI data. Transformer base. LR=10-3
17	gaurvar	INDIC21en-te	2021/04/25 20:03:27	5587	-	-	-	-	-	-	2.31	-	-	-	NMT	No	Multi Task Multi Lingual T5 trained for Multiple Indic Languages
18	gaurvar	INDIC21en-te	2021/05/01 19:34:57	5935	-	-	-	-	-	-	2.31	-	-	-	NMT	No	Multi Task Multi Lingual T5 trained for Multiple Indic Languages

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.

RIBES

#	Team	Task	Date/Time	DataID	RIBES										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	juman	kytea	mecab	moses- tokenizer	stanford- segmenter- ctb	stanford- segmenter- pku	indic- tokenizer	unuse	myseg	kmseg	Method	Other Resources	System Description
1	SRPOL	INDIC21en-te	2021/05/04 15:23:24	6241	-	-	-	-	-	-	0.739835	-	-	-	NMT	No	Ensemble of one-to-many on all data. Pretrained on BT, finetuned on PMI
2	SRPOL	INDIC21en-te	2021/05/04 16:28:56	6267	-	-	-	-	-	-	0.734483	-	-	-	NMT	No	One-to-many on all data. Pretrained on BT, finetuned on PMI
3	IIIT-H	INDIC21en-te	2021/05/03 18:11:37	6014	-	-	-	-	-	-	0.728432	-	-	-	NMT	No	MNMT system (En-XX) trained via exploiting lexical similarity on PMI+CVIT parallel corpus, then improved using back translation on PMI monolingual data followed by fine tuning.
4	sakura	INDIC21en-te	2021/05/04 04:19:16	6160	-	-	-	-	-	-	0.725543	-	-	-	NMT	No	Pre-training multilingual mBART one2many model with training corpus followed by finetuning on PMI Parallel.
5	CFILT	INDIC21en-te	2021/05/04 01:07:21	6051	-	-	-	-	-	-	0.725496	-	-	-	NMT	No	Multilingual(One-to-Many(En-XX)) NMT model based on Transformer with shared encoder and decoder.
6	coastal	INDIC21en-te	2021/05/04 01:40:50	6088	-	-	-	-	-	-	0.707817	-	-	-	NMT	No	seq2seq model trained on all WAT2021 data
7	sakura	INDIC21en-te	2021/05/01 11:39:52	5891	-	-	-	-	-	-	0.703612	-	-	-	NMT	No	Fine-tuning of multilingual mBART one2many model with training corpus.
8	mcairt	INDIC21en-te	2021/05/03 17:25:18	5997	-	-	-	-	-	-	0.702337	-	-	-	NMT	No	multilingual model(one to many model) trained on all WAT 2021 data by using base transformer.
9	SRPOL	INDIC21en-te	2021/04/21 19:34:10	5334	-	-	-	-	-	-	0.692362	-	-	-	NMT	No	Base transformer on all WAT21 data
10	NICT-5	INDIC21en-te	2021/06/25 11:39:47	6492	-	-	-	-	-	-	0.612627	-	-	-	NMT	No	Using PMI and PIB data for fine-tuning on a mbart model trained for over 5 epochs. MNMT model.
11	NICT-5	INDIC21en-te	2021/04/22 11:54:22	5366	-	-	-	-	-	-	0.576863	-	-	-	NMT	No	MBART+MNMT. Beam 4.
12	NLPHut	INDIC21en-te	2021/05/03 00:19:16	5986	-	-	-	-	-	-	0.570112	-	-	-	NMT	No	Transformer with source and target language tags trained using all languages PMI data. Then fine tuned using all en-te data.
13	NICT-5	INDIC21en-te	2021/04/21 15:46:17	5291	-	-	-	-	-	-	0.569735	-	-	-	NMT	No	Pretrain MBART on IndicCorp and FT on bilingual PMI data. Beam search. Model is bilingual.
14	NLPHut	INDIC21en-te	2021/03/20 00:20:23	4618	-	-	-	-	-	-	0.537365	-	-	-	NMT	No	Transformer with target language tag trained using all languages PMI data. Then fine tuned using en-te PMI data.
15	IITP-MT	INDIC21en-te	2021/05/04 18:18:33	6305	-	-	-	-	-	-	0.530898	-	-	-	NMT	No	One-to-Many model trained on all training data with base Transformer. All indic language data is romanized. Model fine-tuned on BT PMI monolingual corpus.
16	ORGANIZER	INDIC21en-te	2021/04/08 17:26:14	4806	-	-	-	-	-	-	0.479896	-	-	-	NMT	No	Bilingual baseline trained on PMI data. Transformer base. LR=10-3
17	gaurvar	INDIC21en-te	2021/04/25 20:03:27	5587	-	-	-	-	-	-	0.414016	-	-	-	NMT	No	Multi Task Multi Lingual T5 trained for Multiple Indic Languages
18	gaurvar	INDIC21en-te	2021/05/01 19:34:57	5935	-	-	-	-	-	-	0.389727	-	-	-	NMT	No	Multi Task Multi Lingual T5 trained for Multiple Indic Languages

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.

AMFM

#	Team	Task	Date/Time	DataID	AMFM										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	Method	Other Resources	System Description
1	SRPOL	INDIC21en-te	2021/05/04 16:28:56	6267	-	-	-	-	-	-	0.792970	-	-	-	NMT	No	One-to-many on all data. Pretrained on BT, finetuned on PMI
2	SRPOL	INDIC21en-te	2021/05/04 15:23:24	6241	-	-	-	-	-	-	0.791085	-	-	-	NMT	No	Ensemble of one-to-many on all data. Pretrained on BT, finetuned on PMI
3	CFILT	INDIC21en-te	2021/05/04 01:07:21	6051	-	-	-	-	-	-	0.789820	-	-	-	NMT	No	Multilingual(One-to-Many(En-XX)) NMT model based on Transformer with shared encoder and decoder.
4	sakura	INDIC21en-te	2021/05/04 04:19:16	6160	-	-	-	-	-	-	0.785055	-	-	-	NMT	No	Pre-training multilingual mBART one2many model with training corpus followed by finetuning on PMI Parallel.
5	mcairt	INDIC21en-te	2021/05/03 17:25:18	5997	-	-	-	-	-	-	0.783647	-	-	-	NMT	No	multilingual model(one to many model) trained on all WAT 2021 data by using base transformer.
6	IIIT-H	INDIC21en-te	2021/05/03 18:11:37	6014	-	-	-	-	-	-	0.780218	-	-	-	NMT	No	MNMT system (En-XX) trained via exploiting lexical similarity on PMI+CVIT parallel corpus, then improved using back translation on PMI monolingual data followed by fine tuning.
7	coastal	INDIC21en-te	2021/05/04 01:40:50	6088	-	-	-	-	-	-	0.778251	-	-	-	NMT	No	seq2seq model trained on all WAT2021 data
8	sakura	INDIC21en-te	2021/05/01 11:39:52	5891	-	-	-	-	-	-	0.772064	-	-	-	NMT	No	Fine-tuning of multilingual mBART one2many model with training corpus.
9	IITP-MT	INDIC21en-te	2021/05/04 18:18:33	6305	-	-	-	-	-	-	0.764977	-	-	-	NMT	No	One-to-Many model trained on all training data with base Transformer. All indic language data is romanized. Model fine-tuned on BT PMI monolingual corpus.
10	SRPOL	INDIC21en-te	2021/04/21 19:34:10	5334	-	-	-	-	-	-	0.763271	-	-	-	NMT	No	Base transformer on all WAT21 data
11	NICT-5	INDIC21en-te	2021/04/21 15:46:17	5291	-	-	-	-	-	-	0.754015	-	-	-	NMT	No	Pretrain MBART on IndicCorp and FT on bilingual PMI data. Beam search. Model is bilingual.
12	NICT-5	INDIC21en-te	2021/04/22 11:54:22	5366	-	-	-	-	-	-	0.752068	-	-	-	NMT	No	MBART+MNMT. Beam 4.
13	NLPHut	INDIC21en-te	2021/03/20 00:20:23	4618	-	-	-	-	-	-	0.749881	-	-	-	NMT	No	Transformer with target language tag trained using all languages PMI data. Then fine tuned using en-te PMI data.
14	NLPHut	INDIC21en-te	2021/05/03 00:19:16	5986	-	-	-	-	-	-	0.713960	-	-	-	NMT	No	Transformer with source and target language tags trained using all languages PMI data. Then fine tuned using all en-te data.
15	ORGANIZER	INDIC21en-te	2021/04/08 17:26:14	4806	-	-	-	-	-	-	0.708086	-	-	-	NMT	No	Bilingual baseline trained on PMI data. Transformer base. LR=10-3
16	gaurvar	INDIC21en-te	2021/05/01 19:34:57	5935	-	-	-	-	-	-	0.642502	-	-	-	NMT	No	Multi Task Multi Lingual T5 trained for Multiple Indic Languages
17	gaurvar	INDIC21en-te	2021/04/25 20:03:27	5587	-	-	-	-	-	-	0.634376	-	-	-	NMT	No	Multi Task Multi Lingual T5 trained for Multiple Indic Languages
18	NICT-5	INDIC21en-te	2021/06/25 11:39:47	6492	-	-	-	-	-	-	0.000000	-	-	-	NMT	No	Using PMI and PIB data for fine-tuning on a mbart model trained for over 5 epochs. MNMT model.

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.
Adequacy-Fluency Metrics (AMFM) is a two-dimensional automatic evaluation metric for machine translation, designed to operate at the sentence level. It is based on adequacy and fluency, to decouple semantic and syntactic components of the translation process to provide a balanced view on translation quality.
AMFM is calculated without tokenizers.
The detail of AMFM is shown on the following paper: "Adequacy–Fluency Metrics: Evaluating MT in the Continuous Space Model Framework" [pdf]. Invited Talk in WAT2015 also helps understanding [slide].

HUMAN (WAT2022)

Notice:

HUMAN (WAT2022) is the result of the Pairwise Crowdsourcing Evaluation on WAT2022.
HUMAN (WAT2022) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2021)

#	Team	Task	Date/Time	DataID	HUMAN	Method	Other Resources	System Description
1	NICT-5	INDIC21en-te	2021/04/21 15:46:17	5291	Underway	NMT	No	Pretrain MBART on IndicCorp and FT on bilingual PMI data. Beam search. Model is bilingual.
2	NICT-5	INDIC21en-te	2021/04/22 11:54:22	5366	Underway	NMT	No	MBART+MNMT. Beam 4.
3	gaurvar	INDIC21en-te	2021/04/25 20:03:27	5587	Underway	NMT	No	Multi Task Multi Lingual T5 trained for Multiple Indic Languages
4	gaurvar	INDIC21en-te	2021/05/01 19:34:57	5935	Underway	NMT	No	Multi Task Multi Lingual T5 trained for Multiple Indic Languages
5	NLPHut	INDIC21en-te	2021/05/03 00:19:16	5986	Underway	NMT	No	Transformer with source and target language tags trained using all languages PMI data. Then fine tuned using all en-te data.
6	mcairt	INDIC21en-te	2021/05/03 17:25:18	5997	Underway	NMT	No	multilingual model(one to many model) trained on all WAT 2021 data by using base transformer.
7	IIIT-H	INDIC21en-te	2021/05/03 18:11:37	6014	Underway	NMT	No	MNMT system (En-XX) trained via exploiting lexical similarity on PMI+CVIT parallel corpus, then improved using back translation on PMI monolingual data followed by fine tuning.
8	CFILT	INDIC21en-te	2021/05/04 01:07:21	6051	Underway	NMT	No	Multilingual(One-to-Many(En-XX)) NMT model based on Transformer with shared encoder and decoder.
9	coastal	INDIC21en-te	2021/05/04 01:40:50	6088	Underway	NMT	No	seq2seq model trained on all WAT2021 data
10	sakura	INDIC21en-te	2021/05/04 04:19:16	6160	Underway	NMT	No	Pre-training multilingual mBART one2many model with training corpus followed by finetuning on PMI Parallel.
11	SRPOL	INDIC21en-te	2021/05/04 15:23:24	6241	Underway	NMT	No	Ensemble of one-to-many on all data. Pretrained on BT, finetuned on PMI
12	SRPOL	INDIC21en-te	2021/05/04 16:28:56	6267	Underway	NMT	No	One-to-many on all data. Pretrained on BT, finetuned on PMI
13	IITP-MT	INDIC21en-te	2021/05/04 18:18:33	6305	Underway	NMT	No	One-to-Many model trained on all training data with base Transformer. All indic language data is romanized. Model fine-tuned on BT PMI monolingual corpus.

Notice:

HUMAN (WAT2021) is the result of the Pairwise Crowdsourcing Evaluation on WAT2021.
HUMAN (WAT2021) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2020)

Notice:

HUMAN (WAT2020) is the result of the Pairwise Crowdsourcing Evaluation on WAT2020.
HUMAN (WAT2020) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2019)

Notice:

HUMAN (WAT2019) is the result of the Pairwise Crowdsourcing Evaluation on WAT2019.
HUMAN (WAT2019) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2018)

Notice:

HUMAN (WAT2018) is the result of the Pairwise Crowdsourcing Evaluation on WAT2018.
HUMAN (WAT2018) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2017)

Notice:

HUMAN (WAT2017) is the result of the Pairwise Crowdsourcing Evaluation on WAT2017.
HUMAN (WAT2017) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2016)

Notice:

HUMAN (WAT2016) is the result of the Pairwise Crowdsourcing Evaluation on WAT2016.
HUMAN (WAT2016) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2015)

Notice:

HUMAN (WAT2015) is the result of the Pairwise Crowdsourcing Evaluation on WAT2015.
HUMAN (WAT2015) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.
The detail of the evaluation can be found in the pdf document (PDF file).

HUMAN (WAT2014)

Notice:

HUMAN (WAT2014) is the result of the Pairwise Crowdsourcing Evaluation on WAT2014.
HUMAN (WAT2014) was evaluated by 3 different workers and the final decision is made by the voting of the judgements.
The detail of the evaluation can be found in the pdf document (PDF file).

EVALUATION RESULTS USAGE POLICY

When you use the WAT evaluation results for any purpose such as:
- writing technical papers,
- making presentations about your system,
- advertising your MT system to the customers,
you can use the information about translation directions, scores (including both automatic and human evaluations) and ranks of your system among others. You can also use the scores of the other systems, but you MUST anonymize the other system's names. In addition, you can show the links (URLs) to the WAT evaluation result pages.

NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2018-08-02

WAT The Workshop on Asian Translation Evaluation Results

BLEU

RIBES

AMFM

HUMAN (WAT2022)

HUMAN (WAT2021)

HUMAN (WAT2020)

HUMAN (WAT2019)

HUMAN (WAT2018)

HUMAN (WAT2017)

HUMAN (WAT2016)

HUMAN (WAT2015)

HUMAN (WAT2014)

EVALUATION RESULTS USAGE POLICY

WAT

The Workshop on Asian Translation

Evaluation Results