WAT

The Workshop on Asian Translation

Evaluation Results

BLEU

#	Team	Task	Date/Time	DataID	BLEU										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	juman	kytea	mecab	moses- tokenizer	stanford- segmenter- ctb	stanford- segmenter- pku	indic- tokenizer	unuse	myseg	kmseg	Method	Other Resources	System Description
1	NICT-4	ALTmy-en	2018/09/13 15:36:40	2303	-	-	-	29.14	-	-	-	-	0.00	0.00	Other	Yes	Many PBSMT and NMT n-best lists combined and reranked. Use monolingual data for back-translation and language model trainings.
2	NICT-4	ALTmy-en	2018/09/13 14:31:16	2290	-	-	-	22.53	-	-	-	-	0.00	0.00	Other	No	Many PBSMT and NMT n-best lists combined and reranked
3	NICT-4	ALTmy-en	2018/08/23 10:29:46	2069	-	-	-	21.97	-	-	-	-	0.00	0.00	NMT	No	NMT baseline: ensemble
4	NICT	ALTmy-en	2018/09/14 10:13:17	2329	-	-	-	20.82	-	-	-	-	0.00	0.00	NMT	No	4 models ensemble
5	NICT-4	ALTmy-en	2018/08/23 10:28:04	2068	-	-	-	18.98	-	-	-	-	0.00	0.00	NMT	No	NMT baseline: single system
6	NICT	ALTmy-en	2018/09/12 15:33:34	2281	-	-	-	16.31	-	-	-	-	0.00	0.00	NMT	No	Single model
7	NICT-5	ALTmy-en	2018/08/22 18:57:56	2056	-	-	-	15.44	-	-	-	-	0.00	0.00	NMT	No	Simple Mixed Fine Tuning model using transformer.
8	ORGANIZER	ALTmy-en	2018/09/04 18:38:58	2228	-	-	-	14.44	-	-	-	-	0.00	0.00	NMT	No	NMT with Attention
9	ORGANIZER	ALTmy-en	2018/08/24 15:29:41	2141	-	-	-	14.24	-	-	-	-	0.00	0.00	Other	Yes	Online A
10	XMUNLP	ALTmy-en	2018/09/16 08:46:40	2456	-	-	-	12.71	-	-	-	-	0.00	0.00	NMT	No	single transformer model
11	XMUNLP	ALTmy-en	2018/09/15 16:42:12	2399	-	-	-	12.11	-	-	-	-	0.00	0.00	NMT	No	single rnnsearch model
12	Osaka-U	ALTmy-en	2018/09/15 22:59:06	2438	-	-	-	11.38	-	-	-	-	0.00	0.00	NMT	Yes	rewarding model
13	NICT-4	ALTmy-en	2018/08/23 10:39:02	2071	-	-	-	11.35	-	-	-	-	0.00	0.00	SMT	Yes	MSLR, with language model trained on common-crawl data.
14	Osaka-U	ALTmy-en	2018/09/16 11:56:52	2463	-	-	-	9.99	-	-	-	-	0.00	0.00	NMT	No	mixed fine tuning
15	UCSYNLP	ALTmy-en	2018/09/14 13:22:24	2332	-	-	-	9.56	-	-	-	-	0.00	0.00	NMT	No	NMT with Attention
16	NICT-4	ALTmy-en	2018/09/13 14:33:29	2291	-	-	-	9.47	-	-	-	-	0.00	0.00	SMT	No	with MSLR models, language models were trained on the target side of the parallel data
17	UCSYNLP	ALTmy-en	2018/09/15 15:44:56	2393	-	-	-	8.91	-	-	-	-	0.00	0.00	SMT	No	HPBSMT
18	UCSYNLP	ALTmy-en	2018/09/15 15:25:27	2391	-	-	-	8.84	-	-	-	-	0.00	0.00	SMT	No	OSM
19	UCSMNLP	ALTmy-en	2018/10/29 15:28:57	2549	-	-	-	6.01	-	-	-	-	0.00	0.00	SMT	No	Batch MIRA tuning
20	UCSMNLP	ALTmy-en	2018/09/14 15:32:10	2338	-	-	-	2.22	-	-	-	-	0.00	0.00	SMT	No	with PBSMT

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.

RIBES

#	Team	Task	Date/Time	DataID	RIBES										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	juman	kytea	mecab	moses- tokenizer	stanford- segmenter- ctb	stanford- segmenter- pku	indic- tokenizer	unuse	myseg	kmseg	Method	Other Resources	System Description
1	NICT-4	ALTmy-en	2018/09/13 15:36:40	2303	-	-	-	0.793960	-	-	-	-	0.000000	0.000000	Other	Yes	Many PBSMT and NMT n-best lists combined and reranked. Use monolingual data for back-translation and language model trainings.
2	NICT-4	ALTmy-en	2018/09/13 14:31:16	2290	-	-	-	0.753767	-	-	-	-	0.000000	0.000000	Other	No	Many PBSMT and NMT n-best lists combined and reranked
3	NICT-4	ALTmy-en	2018/08/23 10:29:46	2069	-	-	-	0.753209	-	-	-	-	0.000000	0.000000	NMT	No	NMT baseline: ensemble
4	NICT	ALTmy-en	2018/09/14 10:13:17	2329	-	-	-	0.740819	-	-	-	-	0.000000	0.000000	NMT	No	4 models ensemble
5	NICT-4	ALTmy-en	2018/08/23 10:28:04	2068	-	-	-	0.740401	-	-	-	-	0.000000	0.000000	NMT	No	NMT baseline: single system
6	NICT-5	ALTmy-en	2018/08/22 18:57:56	2056	-	-	-	0.717430	-	-	-	-	0.000000	0.000000	NMT	No	Simple Mixed Fine Tuning model using transformer.
7	NICT	ALTmy-en	2018/09/12 15:33:34	2281	-	-	-	0.710528	-	-	-	-	0.000000	0.000000	NMT	No	Single model
8	ORGANIZER	ALTmy-en	2018/09/04 18:38:58	2228	-	-	-	0.696861	-	-	-	-	0.000000	0.000000	NMT	No	NMT with Attention
9	XMUNLP	ALTmy-en	2018/09/16 08:46:40	2456	-	-	-	0.682031	-	-	-	-	0.000000	0.000000	NMT	No	single transformer model
10	XMUNLP	ALTmy-en	2018/09/15 16:42:12	2399	-	-	-	0.662820	-	-	-	-	0.000000	0.000000	NMT	No	single rnnsearch model
11	Osaka-U	ALTmy-en	2018/09/15 22:59:06	2438	-	-	-	0.655643	-	-	-	-	0.000000	0.000000	NMT	Yes	rewarding model
12	Osaka-U	ALTmy-en	2018/09/16 11:56:52	2463	-	-	-	0.648923	-	-	-	-	0.000000	0.000000	NMT	No	mixed fine tuning
13	UCSYNLP	ALTmy-en	2018/09/14 13:22:24	2332	-	-	-	0.642309	-	-	-	-	0.000000	0.000000	NMT	No	NMT with Attention
14	ORGANIZER	ALTmy-en	2018/08/24 15:29:41	2141	-	-	-	0.598345	-	-	-	-	0.000000	0.000000	Other	Yes	Online A
15	UCSYNLP	ALTmy-en	2018/09/15 15:44:56	2393	-	-	-	0.583956	-	-	-	-	0.000000	0.000000	SMT	No	HPBSMT
16	NICT-4	ALTmy-en	2018/08/23 10:39:02	2071	-	-	-	0.580091	-	-	-	-	0.000000	0.000000	SMT	Yes	MSLR, with language model trained on common-crawl data.
17	NICT-4	ALTmy-en	2018/09/13 14:33:29	2291	-	-	-	0.575931	-	-	-	-	0.000000	0.000000	SMT	No	with MSLR models, language models were trained on the target side of the parallel data
18	UCSYNLP	ALTmy-en	2018/09/15 15:25:27	2391	-	-	-	0.553786	-	-	-	-	0.000000	0.000000	SMT	No	OSM
19	UCSMNLP	ALTmy-en	2018/10/29 15:28:57	2549	-	-	-	0.536321	-	-	-	-	0.000000	0.000000	SMT	No	Batch MIRA tuning
20	UCSMNLP	ALTmy-en	2018/09/14 15:32:10	2338	-	-	-	0.470280	-	-	-	-	0.000000	0.000000	SMT	No	with PBSMT

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.

AMFM

#	Team	Task	Date/Time	DataID	AMFM										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	Method	Other Resources	System Description
1	NICT-4	ALTmy-en	2018/09/13 15:36:40	2303	-	-	-	0.655910	-	-	-	-	0.000000	0.000000	Other	Yes	Many PBSMT and NMT n-best lists combined and reranked. Use monolingual data for back-translation and language model trainings.
2	UCSYNLP	ALTmy-en	2018/09/15 15:25:27	2391	-	-	-	0.594800	-	-	-	-	0.000000	0.000000	SMT	No	OSM
3	NICT-4	ALTmy-en	2018/08/23 10:28:04	2068	-	-	-	0.589310	-	-	-	-	0.000000	0.000000	NMT	No	NMT baseline: single system
4	NICT	ALTmy-en	2018/09/12 15:33:34	2281	-	-	-	0.589020	-	-	-	-	0.000000	0.000000	NMT	No	Single model
5	NICT-4	ALTmy-en	2018/08/23 10:29:46	2069	-	-	-	0.586770	-	-	-	-	0.000000	0.000000	NMT	No	NMT baseline: ensemble
6	NICT-4	ALTmy-en	2018/09/13 14:33:29	2291	-	-	-	0.584040	-	-	-	-	0.000000	0.000000	SMT	No	with MSLR models, language models were trained on the target side of the parallel data
7	NICT-4	ALTmy-en	2018/09/13 14:31:16	2290	-	-	-	0.582230	-	-	-	-	0.000000	0.000000	Other	No	Many PBSMT and NMT n-best lists combined and reranked
8	NICT	ALTmy-en	2018/09/14 10:13:17	2329	-	-	-	0.580690	-	-	-	-	0.000000	0.000000	NMT	No	4 models ensemble
9	NICT-5	ALTmy-en	2018/08/22 18:57:56	2056	-	-	-	0.579520	-	-	-	-	0.000000	0.000000	NMT	No	Simple Mixed Fine Tuning model using transformer.
10	ORGANIZER	ALTmy-en	2018/08/24 15:29:41	2141	-	-	-	0.576780	-	-	-	-	0.000000	0.000000	Other	Yes	Online A
11	NICT-4	ALTmy-en	2018/08/23 10:39:02	2071	-	-	-	0.569370	-	-	-	-	0.000000	0.000000	SMT	Yes	MSLR, with language model trained on common-crawl data.
12	UCSYNLP	ALTmy-en	2018/09/15 15:44:56	2393	-	-	-	0.560800	-	-	-	-	0.000000	0.000000	SMT	No	HPBSMT
13	UCSMNLP	ALTmy-en	2018/10/29 15:28:57	2549	-	-	-	0.552430	-	-	-	-	0.000000	0.000000	SMT	No	Batch MIRA tuning
14	Osaka-U	ALTmy-en	2018/09/16 11:56:52	2463	-	-	-	0.552040	-	-	-	-	0.000000	0.000000	NMT	No	mixed fine tuning
15	XMUNLP	ALTmy-en	2018/09/16 08:46:40	2456	-	-	-	0.543700	-	-	-	-	0.000000	0.000000	NMT	No	single transformer model
16	ORGANIZER	ALTmy-en	2018/09/04 18:38:58	2228	-	-	-	0.525950	-	-	-	-	0.000000	0.000000	NMT	No	NMT with Attention
17	UCSYNLP	ALTmy-en	2018/09/14 13:22:24	2332	-	-	-	0.518990	-	-	-	-	0.000000	0.000000	NMT	No	NMT with Attention
18	Osaka-U	ALTmy-en	2018/09/15 22:59:06	2438	-	-	-	0.510900	-	-	-	-	0.000000	0.000000	NMT	Yes	rewarding model
19	XMUNLP	ALTmy-en	2018/09/15 16:42:12	2399	-	-	-	0.500210	-	-	-	-	0.000000	0.000000	NMT	No	single rnnsearch model
20	UCSMNLP	ALTmy-en	2018/09/14 15:32:10	2338	-	-	-	0.354550	-	-	-	-	0.000000	0.000000	SMT	No	with PBSMT

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.
Adequacy-Fluency Metrics (AMFM) is a two-dimensional automatic evaluation metric for machine translation, designed to operate at the sentence level. It is based on adequacy and fluency, to decouple semantic and syntactic components of the translation process to provide a balanced view on translation quality.
AMFM is calculated without tokenizers.
The detail of AMFM is shown on the following paper: "Adequacy–Fluency Metrics: Evaluating MT in the Continuous Space Model Framework" [pdf]. Invited Talk in WAT2015 also helps understanding [slide].

HUMAN (WAT2022)

Notice:

HUMAN (WAT2022) is the result of the Pairwise Crowdsourcing Evaluation on WAT2022.
HUMAN (WAT2022) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2021)

Notice:

HUMAN (WAT2021) is the result of the Pairwise Crowdsourcing Evaluation on WAT2021.
HUMAN (WAT2021) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2020)

Notice:

HUMAN (WAT2020) is the result of the Pairwise Crowdsourcing Evaluation on WAT2020.
HUMAN (WAT2020) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2019)

Notice:

HUMAN (WAT2019) is the result of the Pairwise Crowdsourcing Evaluation on WAT2019.
HUMAN (WAT2019) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2018)

Notice:

HUMAN (WAT2018) is the result of the Pairwise Crowdsourcing Evaluation on WAT2018.
HUMAN (WAT2018) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2017)

Notice:

HUMAN (WAT2017) is the result of the Pairwise Crowdsourcing Evaluation on WAT2017.
HUMAN (WAT2017) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2016)

Notice:

HUMAN (WAT2016) is the result of the Pairwise Crowdsourcing Evaluation on WAT2016.
HUMAN (WAT2016) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2015)

Notice:

HUMAN (WAT2015) is the result of the Pairwise Crowdsourcing Evaluation on WAT2015.
HUMAN (WAT2015) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.
The detail of the evaluation can be found in the pdf document (PDF file).

HUMAN (WAT2014)

Notice:

HUMAN (WAT2014) is the result of the Pairwise Crowdsourcing Evaluation on WAT2014.
HUMAN (WAT2014) was evaluated by 3 different workers and the final decision is made by the voting of the judgements.
The detail of the evaluation can be found in the pdf document (PDF file).

EVALUATION RESULTS USAGE POLICY

When you use the WAT evaluation results for any purpose such as:
- writing technical papers,
- making presentations about your system,
- advertising your MT system to the customers,
you can use the information about translation directions, scores (including both automatic and human evaluations) and ranks of your system among others. You can also use the scores of the other systems, but you MUST anonymize the other system's names. In addition, you can show the links (URLs) to the WAT evaluation result pages.

NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2018-08-02

WAT The Workshop on Asian Translation Evaluation Results

BLEU

RIBES

AMFM

HUMAN (WAT2022)

HUMAN (WAT2021)

HUMAN (WAT2020)

HUMAN (WAT2019)

HUMAN (WAT2018)

HUMAN (WAT2017)

HUMAN (WAT2016)

HUMAN (WAT2015)

HUMAN (WAT2014)

EVALUATION RESULTS USAGE POLICY

WAT

The Workshop on Asian Translation

Evaluation Results