WAT

The Workshop on Asian Translation

Evaluation Results

BLEU

#	Team	Task	Date/Time	DataID	BLEU										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	juman	kytea	mecab	moses- tokenizer	stanford- segmenter- ctb	stanford- segmenter- pku	indic- tokenizer	unuse	myseg	kmseg	Method	Other Resources	System Description
1	00-7	MMCHMM24en-hi	2024/08/05 15:02:38	7190	-	-	-	-	-	-	53.40	-	-	-	NMT	Yes	challenge set
2	BITS-P	MMCHMM24en-hi	2023/07/08 13:44:39	7124	-	-	-	-	-	-	52.10	-	-	-	NMT	Yes	NLLB model finetuned on captions + object tags of original & synthetic images using DETR model
3	Volta	MMCHMM24en-hi	2021/05/25 13:56:03	6430	-	-	-	-	-	-	51.60	-	-	-	NMT	Yes	Finetuned mBART (Used IITB for data augmentation) and added object tags to the input using Mask RCNN
4	v036	MMCHMM24en-hi	2024/08/15 18:19:50	7406	-	-	-	-	-	-	43.20	-	-	-	NMT	No
5	ODIAGEN	MMCHMM24en-hi	2023/07/06 03:54:04	7106	-	-	-	-	-	-	42.80	-	-	-	NMT	No	Image features extracted as Object tags appended with text and MBART fine-tuning
6	v036	MMCHMM24en-hi	2024/08/11 12:43:40	7319	-	-	-	-	-	-	42.50	-	-	-	NMT	No	NMT based system using both image descriptors and text description. A multistage LLM pipeline used for extracting image data descriptions and translation. Fine tuning done in few cases Models Used:
7	v036	MMCHMM24en-hi	2024/08/11 23:48:23	7353	-	-	-	-	-	-	40.30	-	-	-	NMT	No
8	v036	MMCHMM24en-hi	2024/08/14 20:25:45	7397	-	-	-	-	-	-	39.70	-	-	-	NMT	No
9	CNLP-NITS-PP	MMCHMM24en-hi	2022/07/11 12:39:25	6741	-	-	-	-	-	-	39.30	-	-	-	NMT	No	Transliteration-based phrase pairs augmentation and visual features in training using BRNN encoder and doubly-attentive-rnn decoder.
10	CNLP-NITS-PP	MMCHMM24en-hi	2021/04/27 23:56:01	5730	-	-	-	-	-	-	39.28	-	-	-	NMT	Yes	Pretrained monolingual data (IITB) using Glove and fine-tuned with parallel data (WAT21 train data+ Extracted Phrase pairs from WAT21 train data +IITB train data) and visual features in training using
11	SILO_NLP	MMCHMM24en-hi	2022/07/18 17:33:08	6959	-	-	-	-	-	-	39.10	-	-	-	NMT	Yes	Object Tags (Image) + Flickr8 dataset as additional resource + Finetune mBART
12	239233	MMCHMM24en-hi	2024/08/13 07:05:33	7378	-	-	-	-	-	-	37.90	-	-	-	NMT	Yes	One-shot prompt for synthetic QA description from captions; translate QA using IndicTrans2; generate caption from QA as context
13	iitp	MMCHMM24en-hi	2021/05/01 23:25:56	5942	-	-	-	-	-	-	37.50	-	-	-	NMT	No	Removed special chars at start and end of sentence 1. pre-trained with HindEnCorp, trained with Vis Gen 2. trained with Visual Genome Selected best of two for each sentence according to translation
14	CNLP-NITS	MMCHMM24en-hi	2020/09/18 15:52:35	3894	-	-	-	-	-	-	33.57	-	-	-	NMT	Yes	Pretrained monolingual data (IITB) using Glove and fine-tuned with parallel data and visual features in training using BRNN encoder and doubly-attentive-rnn decoder.
15	v036	MMCHMM24en-hi	2024/08/11 13:55:39	7328	-	-	-	-	-	-	32.30	-	-	-	NMT	No	NMT based system using both image descriptors and text description. A multistage LLM pipeline used for extracting image data descriptions and translation. Fine tuning done in few cases Models Used:
16	DCU_NMT	MMCHMM24en-hi	2024/08/13 03:25:35	7372	-	-	-	-	-	-	30.30	-	-	-	NMT	No	NMT system trained on constrained resources using bert encoded context extracted from visual representation of training data. The context is used only on source side.
17	DCU_NMT	MMCHMM24en-hi	2024/08/11 23:07:08	7352	-	-	-	-	-	-	28.60	-	-	-	SMT	No	Context-aware model that uses image caption data extracted from images as context.
18	ORGANIZER	MMCHMM24en-hi	2020/11/07 01:35:40	4179	-	-	-	-	-	-	20.34	-	-	-	NMT	No
19	ORGANIZER	MMCHMM24en-hi	2020/11/07 01:39:32	4178	-	-	-	-	-	-	20.34	-	-	-	NMT	No
20	ORGANIZER	MMCHMM24en-hi	2020/11/07 01:47:44	4180	-	-	-	-	-	-	20.34	-	-	-	NMT	No
21	ODIAGEN	MMCHMM24en-hi	2024/08/15 17:20:58	7403	-	-	-	-	-	-	1.10	-	-	-	Other	No	LLM-based (LLava fine-tuned)
22	ODIAGEN	MMCHMM24en-hi	2024/08/15 17:57:50	7404	-	-	-	-	-	-	0.50	-	-	-	Other	No	LLM-based (LLava fine-tuned for 10 epochs)

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.

RIBES

#	Team	Task	Date/Time	DataID	RIBES										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	juman	kytea	mecab	moses- tokenizer	stanford- segmenter- ctb	stanford- segmenter- pku	indic- tokenizer	unuse	myseg	kmseg	Method	Other Resources	System Description
1	Volta	MMCHMM24en-hi	2021/05/25 13:56:03	6430	-	-	-	-	-	-	0.859645	-	-	-	NMT	Yes	Finetuned mBART (Used IITB for data augmentation) and added object tags to the input using Mask RCNN
2	BITS-P	MMCHMM24en-hi	2023/07/08 13:44:39	7124	-	-	-	-	-	-	0.853388	-	-	-	NMT	Yes	NLLB model finetuned on captions + object tags of original & synthetic images using DETR model
3	00-7	MMCHMM24en-hi	2024/08/05 15:02:38	7190	-	-	-	-	-	-	0.842400	-	-	-	NMT	Yes	challenge set
4	ODIAGEN	MMCHMM24en-hi	2023/07/06 03:54:04	7106	-	-	-	-	-	-	0.815156	-	-	-	NMT	No	Image features extracted as Object tags appended with text and MBART fine-tuning
5	v036	MMCHMM24en-hi	2024/08/15 18:19:50	7406	-	-	-	-	-	-	0.812507	-	-	-	NMT	No
6	v036	MMCHMM24en-hi	2024/08/11 12:43:40	7319	-	-	-	-	-	-	0.801778	-	-	-	NMT	No	NMT based system using both image descriptors and text description. A multistage LLM pipeline used for extracting image data descriptions and translation. Fine tuning done in few cases Models Used:
7	v036	MMCHMM24en-hi	2024/08/11 23:48:23	7353	-	-	-	-	-	-	0.796730	-	-	-	NMT	No
8	239233	MMCHMM24en-hi	2024/08/13 07:05:33	7378	-	-	-	-	-	-	0.795538	-	-	-	NMT	Yes	One-shot prompt for synthetic QA description from captions; translate QA using IndicTrans2; generate caption from QA as context
9	v036	MMCHMM24en-hi	2024/08/14 20:25:45	7397	-	-	-	-	-	-	0.793972	-	-	-	NMT	No
10	CNLP-NITS-PP	MMCHMM24en-hi	2021/04/27 23:56:01	5730	-	-	-	-	-	-	0.792097	-	-	-	NMT	Yes	Pretrained monolingual data (IITB) using Glove and fine-tuned with parallel data (WAT21 train data+ Extracted Phrase pairs from WAT21 train data +IITB train data) and visual features in training using
11	CNLP-NITS-PP	MMCHMM24en-hi	2022/07/11 12:39:25	6741	-	-	-	-	-	-	0.791468	-	-	-	NMT	No	Transliteration-based phrase pairs augmentation and visual features in training using BRNN encoder and doubly-attentive-rnn decoder.
12	iitp	MMCHMM24en-hi	2021/05/01 23:25:56	5942	-	-	-	-	-	-	0.790809	-	-	-	NMT	No	Removed special chars at start and end of sentence 1. pre-trained with HindEnCorp, trained with Vis Gen 2. trained with Visual Genome Selected best of two for each sentence according to translation
13	SILO_NLP	MMCHMM24en-hi	2022/07/18 17:33:08	6959	-	-	-	-	-	-	0.784169	-	-	-	NMT	Yes	Object Tags (Image) + Flickr8 dataset as additional resource + Finetune mBART
14	CNLP-NITS	MMCHMM24en-hi	2020/09/18 15:52:35	3894	-	-	-	-	-	-	0.754141	-	-	-	NMT	Yes	Pretrained monolingual data (IITB) using Glove and fine-tuned with parallel data and visual features in training using BRNN encoder and doubly-attentive-rnn decoder.
15	v036	MMCHMM24en-hi	2024/08/11 13:55:39	7328	-	-	-	-	-	-	0.752134	-	-	-	NMT	No	NMT based system using both image descriptors and text description. A multistage LLM pipeline used for extracting image data descriptions and translation. Fine tuning done in few cases Models Used:
16	DCU_NMT	MMCHMM24en-hi	2024/08/11 23:07:08	7352	-	-	-	-	-	-	0.711555	-	-	-	SMT	No	Context-aware model that uses image caption data extracted from images as context.
17	DCU_NMT	MMCHMM24en-hi	2024/08/13 03:25:35	7372	-	-	-	-	-	-	0.710342	-	-	-	NMT	No	NMT system trained on constrained resources using bert encoded context extracted from visual representation of training data. The context is used only on source side.
18	ORGANIZER	MMCHMM24en-hi	2020/11/07 01:35:40	4179	-	-	-	-	-	-	0.644230	-	-	-	NMT	No
19	ORGANIZER	MMCHMM24en-hi	2020/11/07 01:39:32	4178	-	-	-	-	-	-	0.644230	-	-	-	NMT	No
20	ORGANIZER	MMCHMM24en-hi	2020/11/07 01:47:44	4180	-	-	-	-	-	-	0.644230	-	-	-	NMT	No
21	ODIAGEN	MMCHMM24en-hi	2024/08/15 17:20:58	7403	-	-	-	-	-	-	0.151195	-	-	-	Other	No	LLM-based (LLava fine-tuned)
22	ODIAGEN	MMCHMM24en-hi	2024/08/15 17:57:50	7404	-	-	-	-	-	-	0.116894	-	-	-	Other	No	LLM-based (LLava fine-tuned for 10 epochs)

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.

AMFM

#	Team	Task	Date/Time	DataID	AMFM										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	Method	Other Resources	System Description
1	Volta	MMCHMM24en-hi	2021/05/25 13:56:03	6430	-	-	-	-	-	-	0.877000	-	-	-	NMT	Yes	Finetuned mBART (Used IITB for data augmentation) and added object tags to the input using Mask RCNN
2	iitp	MMCHMM24en-hi	2021/05/01 23:25:56	5942	-	-	-	-	-	-	0.823429	-	-	-	NMT	No	Removed special chars at start and end of sentence 1. pre-trained with HindEnCorp, trained with Vis Gen 2. trained with Visual Genome Selected best of two for each sentence according to translation
3	CNLP-NITS-PP	MMCHMM24en-hi	2021/04/27 23:56:01	5730	-	-	-	-	-	-	0.817356	-	-	-	NMT	Yes	Pretrained monolingual data (IITB) using Glove and fine-tuned with parallel data (WAT21 train data+ Extracted Phrase pairs from WAT21 train data +IITB train data) and visual features in training using
4	CNLP-NITS	MMCHMM24en-hi	2020/09/18 15:52:35	3894	-	-	-	-	-	-	0.787320	-	-	-	NMT	Yes	Pretrained monolingual data (IITB) using Glove and fine-tuned with parallel data and visual features in training using BRNN encoder and doubly-attentive-rnn decoder.
5	ORGANIZER	MMCHMM24en-hi	2020/11/07 01:35:40	4179	-	-	-	-	-	-	0.669760	-	-	-	NMT	No
6	ORGANIZER	MMCHMM24en-hi	2020/11/07 01:39:32	4178	-	-	-	-	-	-	0.669760	-	-	-	NMT	No
7	ORGANIZER	MMCHMM24en-hi	2020/11/07 01:47:44	4180	-	-	-	-	-	-	0.669760	-	-	-	NMT	No
8	CNLP-NITS-PP	MMCHMM24en-hi	2022/07/11 12:39:25	6741	-	-	-	-	-	-	0.000000	-	-	-	NMT	No	Transliteration-based phrase pairs augmentation and visual features in training using BRNN encoder and doubly-attentive-rnn decoder.
9	SILO_NLP	MMCHMM24en-hi	2022/07/18 17:33:08	6959	-	-	-	-	-	-	0.000000	-	-	-	NMT	Yes	Object Tags (Image) + Flickr8 dataset as additional resource + Finetune mBART
10	ODIAGEN	MMCHMM24en-hi	2023/07/06 03:54:04	7106	-	-	-	-	-	-	0.000000	-	-	-	NMT	No	Image features extracted as Object tags appended with text and MBART fine-tuning
11	BITS-P	MMCHMM24en-hi	2023/07/08 13:44:39	7124	-	-	-	-	-	-	0.000000	-	-	-	NMT	Yes	NLLB model finetuned on captions + object tags of original & synthetic images using DETR model
12	00-7	MMCHMM24en-hi	2024/08/05 15:02:38	7190	-	-	-	-	-	-	0.000000	-	-	-	NMT	Yes	challenge set
13	v036	MMCHMM24en-hi	2024/08/11 12:43:40	7319	-	-	-	-	-	-	0.000000	-	-	-	NMT	No	NMT based system using both image descriptors and text description. A multistage LLM pipeline used for extracting image data descriptions and translation. Fine tuning done in few cases Models Used:
14	v036	MMCHMM24en-hi	2024/08/11 13:55:39	7328	-	-	-	-	-	-	0.000000	-	-	-	NMT	No	NMT based system using both image descriptors and text description. A multistage LLM pipeline used for extracting image data descriptions and translation. Fine tuning done in few cases Models Used:
15	DCU_NMT	MMCHMM24en-hi	2024/08/11 23:07:08	7352	-	-	-	-	-	-	0.000000	-	-	-	SMT	No	Context-aware model that uses image caption data extracted from images as context.
16	v036	MMCHMM24en-hi	2024/08/11 23:48:23	7353	-	-	-	-	-	-	0.000000	-	-	-	NMT	No
17	DCU_NMT	MMCHMM24en-hi	2024/08/13 03:25:35	7372	-	-	-	-	-	-	0.000000	-	-	-	NMT	No	NMT system trained on constrained resources using bert encoded context extracted from visual representation of training data. The context is used only on source side.
18	239233	MMCHMM24en-hi	2024/08/13 07:05:33	7378	-	-	-	-	-	-	0.000000	-	-	-	NMT	Yes	One-shot prompt for synthetic QA description from captions; translate QA using IndicTrans2; generate caption from QA as context
19	v036	MMCHMM24en-hi	2024/08/14 20:25:45	7397	-	-	-	-	-	-	0.000000	-	-	-	NMT	No
20	ODIAGEN	MMCHMM24en-hi	2024/08/15 17:20:58	7403	-	-	-	-	-	-	0.000000	-	-	-	Other	No	LLM-based (LLava fine-tuned)
21	ODIAGEN	MMCHMM24en-hi	2024/08/15 17:57:50	7404	-	-	-	-	-	-	0.000000	-	-	-	Other	No	LLM-based (LLava fine-tuned for 10 epochs)
22	v036	MMCHMM24en-hi	2024/08/15 18:19:50	7406	-	-	-	-	-	-	0.000000	-	-	-	NMT	No

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.
Adequacy-Fluency Metrics (AMFM) is a two-dimensional automatic evaluation metric for machine translation, designed to operate at the sentence level. It is based on adequacy and fluency, to decouple semantic and syntactic components of the translation process to provide a balanced view on translation quality.
AMFM is calculated without tokenizers.
The detail of AMFM is shown on the following paper: "Adequacy–Fluency Metrics: Evaluating MT in the Continuous Space Model Framework" [pdf]. Invited Talk in WAT2015 also helps understanding [slide].

HUMAN (WAT2022)

Notice:

HUMAN (WAT2022) is the result of the Pairwise Crowdsourcing Evaluation on WAT2022.
HUMAN (WAT2022) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2021)

Notice:

HUMAN (WAT2021) is the result of the Pairwise Crowdsourcing Evaluation on WAT2021.
HUMAN (WAT2021) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2020)

Notice:

HUMAN (WAT2020) is the result of the Pairwise Crowdsourcing Evaluation on WAT2020.
HUMAN (WAT2020) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2019)

Notice:

HUMAN (WAT2019) is the result of the Pairwise Crowdsourcing Evaluation on WAT2019.
HUMAN (WAT2019) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2018)

Notice:

HUMAN (WAT2018) is the result of the Pairwise Crowdsourcing Evaluation on WAT2018.
HUMAN (WAT2018) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2017)

Notice:

HUMAN (WAT2017) is the result of the Pairwise Crowdsourcing Evaluation on WAT2017.
HUMAN (WAT2017) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2016)

Notice:

HUMAN (WAT2016) is the result of the Pairwise Crowdsourcing Evaluation on WAT2016.
HUMAN (WAT2016) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2015)

Notice:

HUMAN (WAT2015) is the result of the Pairwise Crowdsourcing Evaluation on WAT2015.
HUMAN (WAT2015) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.
The detail of the evaluation can be found in the pdf document (PDF file).

HUMAN (WAT2014)

Notice:

HUMAN (WAT2014) is the result of the Pairwise Crowdsourcing Evaluation on WAT2014.
HUMAN (WAT2014) was evaluated by 3 different workers and the final decision is made by the voting of the judgements.
The detail of the evaluation can be found in the pdf document (PDF file).

EVALUATION RESULTS USAGE POLICY

When you use the WAT evaluation results for any purpose such as:
- writing technical papers,
- making presentations about your system,
- advertising your MT system to the customers,
you can use the information about translation directions, scores (including both automatic and human evaluations) and ranks of your system among others. You can also use the scores of the other systems, but you MUST anonymize the other system's names. In addition, you can show the links (URLs) to the WAT evaluation result pages.

NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2018-08-02

WAT The Workshop on Asian Translation Evaluation Results

BLEU

RIBES

AMFM

HUMAN (WAT2022)

HUMAN (WAT2021)

HUMAN (WAT2020)

HUMAN (WAT2019)

HUMAN (WAT2018)

HUMAN (WAT2017)

HUMAN (WAT2016)

HUMAN (WAT2015)

HUMAN (WAT2014)

EVALUATION RESULTS USAGE POLICY

WAT

The Workshop on Asian Translation

Evaluation Results