WAT

The Workshop on Asian Translation

Evaluation Results

BLEU

#	Team	Task	Date/Time	DataID	BLEU										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	juman	kytea	mecab	moses- tokenizer	stanford- segmenter- ctb	stanford- segmenter- pku	indic- tokenizer	unuse	myseg	kmseg	Method	Other Resources	System Description
1	ut-mrt	BSDen-ja	2020/09/18 18:49:05	3942	19.50	25.50	20.54	-	-	-	-	-	-	-	NMT	Yes	Transformer-base ensemble of 2 best models trained on large batches of the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0, WMT 2020, and news corpora; tuned on the full BSD corpus.
2	goku20	BSDen-ja	2020/09/15 20:33:23	3756	19.43	26.04	20.75	-	-	-	-	-	-	-	NMT	Yes	mBART pre-training, doc-level ensembled model, JESC parallel corpus
3	DEEPNLP	BSDen-ja	2020/09/19 15:03:40	4050	19.39	26.59	20.95	-	-	-	-	-	-	-	NMT	Yes	An ensemble of transformer models trained on several publicly available JA-EN datasets such as JESC, KFTT, MTNT, etc, and then finetuned on filtered back-translated data further finetuning on BSD.
4	ut-mrt	BSDen-ja	2020/09/17 12:04:19	3793	18.85	24.84	20.04	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on large batches of the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0, and jParaCrawl; tuned on the full BSD corpus.
5	DEEPNLP	BSDen-ja	2020/09/19 03:23:53	4027	18.76	25.86	20.19	-	-	-	-	-	-	-	NMT	Yes	Training corpus is a mix of several publicly available JA-EN datasets such as JESC,KFTT,MTNT, etc. Model - Base Transformer trained and then finetuned on BSD.
6	ut-mrt	BSDen-ja	2020/09/19 20:55:04	4076	17.25	23.55	18.60	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on document aligned news data, the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with one previous context sentence.
7	ut-mrt	BSDen-ja	2020/09/19 20:21:04	4068	14.87	20.58	16.08	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on data from WMT 2020 without any BSD
8	ut-mrt	BSDen-ja	2020/09/19 20:44:13	4071	14.59	21.09	16.03	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus and one previous context sentence. Average of 4 best models.
9	goku20	BSDen-ja	2020/09/15 20:27:00	3753	14.49	20.89	15.75	-	-	-	-	-	-	-	NMT	No	mBART pre-training, doc-level single model
10	ut-mrt	BSDen-ja	2020/09/19 20:49:11	4074	14.19	20.32	15.49	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus. Average of 4 best models.
11	ut-mrt	BSDen-ja	2020/09/17 13:50:02	3803	13.78	20.02	15.26	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0
12	ut-mrt	BSDen-ja	2020/09/19 20:25:59	4070	12.55	17.60	13.47	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on data from WMT 2020 + the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0
13	ut-mrt	BSDen-ja	2020/09/17 13:55:20	3805	12.20	18.37	13.63	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k)
14	ut-mrt	BSDen-ja	2020/09/18 17:57:22	3937	11.75	17.94	13.13	-	-	-	-	-	-	-	NMT	Yes	Transformer-small trained on the full BSD corpus (80k) with one previous context sentence
15	DEEPNLP	BSDen-ja	2020/09/19 15:01:28	4049	8.66	15.10	10.10	-	-	-	-	-	-	-	NMT	No	An ensemble of two transformer models trained on BSD corpus (20k)
16	DEEPNLP	BSDen-ja	2020/09/19 03:22:48	4026	7.90	14.03	9.10	-	-	-	-	-	-	-	NMT	No	Transformer model trained on BSD corpus (20k)
17	ut-mrt	BSDen-ja	2020/09/18 17:19:52	3920	7.83	13.51	9.21	-	-	-	-	-	-	-	SMT	No	SMT Baseline trained the BSD corpus from GitHub (20k)
18	ut-mrt	BSDen-ja	2020/09/19 18:55:29	4060	6.70	12.13	7.84	-	-	-	-	-	-	-	NMT	No	Transformer-small (4layer) trained the BSD corpus from GitHub (20k) with one previous context sentence. Average of four best models.
19	ut-mrt	BSDen-ja	2020/09/17 16:51:27	3810	5.84	10.82	6.97	-	-	-	-	-	-	-	NMT	No	Transformer-small (4layer) trained the BSD corpus from GitHub (20k).

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.

RIBES

#	Team	Task	Date/Time	DataID	RIBES										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	juman	kytea	mecab	moses- tokenizer	stanford- segmenter- ctb	stanford- segmenter- pku	indic- tokenizer	unuse	myseg	kmseg	Method	Other Resources	System Description
1	goku20	BSDen-ja	2020/09/15 20:33:23	3756	0.745493	0.771588	0.749985	-	-	-	-	-	-	-	NMT	Yes	mBART pre-training, doc-level ensembled model, JESC parallel corpus
2	DEEPNLP	BSDen-ja	2020/09/19 15:03:40	4050	0.744209	0.770217	0.753233	-	-	-	-	-	-	-	NMT	Yes	An ensemble of transformer models trained on several publicly available JA-EN datasets such as JESC, KFTT, MTNT, etc, and then finetuned on filtered back-translated data further finetuning on BSD.
3	DEEPNLP	BSDen-ja	2020/09/19 03:23:53	4027	0.738947	0.767264	0.749352	-	-	-	-	-	-	-	NMT	Yes	Training corpus is a mix of several publicly available JA-EN datasets such as JESC,KFTT,MTNT, etc. Model - Base Transformer trained and then finetuned on BSD.
4	ut-mrt	BSDen-ja	2020/09/19 20:55:04	4076	0.734692	0.759465	0.744708	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on document aligned news data, the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with one previous context sentence.
5	ut-mrt	BSDen-ja	2020/09/18 18:49:05	3942	0.724624	0.758777	0.740521	-	-	-	-	-	-	-	NMT	Yes	Transformer-base ensemble of 2 best models trained on large batches of the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0, WMT 2020, and news corpora; tuned on the full BSD corpus.
6	ut-mrt	BSDen-ja	2020/09/19 20:44:13	4071	0.720883	0.744652	0.731101	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus and one previous context sentence. Average of 4 best models.
7	ut-mrt	BSDen-ja	2020/09/17 12:04:19	3793	0.720365	0.755118	0.736793	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on large batches of the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0, and jParaCrawl; tuned on the full BSD corpus.
8	goku20	BSDen-ja	2020/09/15 20:27:00	3753	0.700058	0.739468	0.715971	-	-	-	-	-	-	-	NMT	No	mBART pre-training, doc-level single model
9	ut-mrt	BSDen-ja	2020/09/19 20:49:11	4074	0.698327	0.737332	0.719519	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus. Average of 4 best models.
10	ut-mrt	BSDen-ja	2020/09/19 20:21:04	4068	0.696633	0.725803	0.707683	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on data from WMT 2020 without any BSD
11	ut-mrt	BSDen-ja	2020/09/18 17:57:22	3937	0.691909	0.723216	0.705550	-	-	-	-	-	-	-	NMT	Yes	Transformer-small trained on the full BSD corpus (80k) with one previous context sentence
12	ut-mrt	BSDen-ja	2020/09/17 13:55:20	3805	0.687411	0.721043	0.703018	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k)
13	ut-mrt	BSDen-ja	2020/09/17 13:50:02	3803	0.682091	0.722163	0.701005	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0
14	ut-mrt	BSDen-ja	2020/09/19 20:25:59	4070	0.643793	0.685148	0.658451	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on data from WMT 2020 + the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0
15	DEEPNLP	BSDen-ja	2020/09/19 15:01:28	4049	0.640669	0.686233	0.658636	-	-	-	-	-	-	-	NMT	No	An ensemble of two transformer models trained on BSD corpus (20k)
16	DEEPNLP	BSDen-ja	2020/09/19 03:22:48	4026	0.632236	0.676537	0.645595	-	-	-	-	-	-	-	NMT	No	Transformer model trained on BSD corpus (20k)
17	ut-mrt	BSDen-ja	2020/09/18 17:19:52	3920	0.625846	0.666769	0.641864	-	-	-	-	-	-	-	SMT	No	SMT Baseline trained the BSD corpus from GitHub (20k)
18	ut-mrt	BSDen-ja	2020/09/19 18:55:29	4060	0.597037	0.648397	0.613298	-	-	-	-	-	-	-	NMT	No	Transformer-small (4layer) trained the BSD corpus from GitHub (20k) with one previous context sentence. Average of four best models.
19	ut-mrt	BSDen-ja	2020/09/17 16:51:27	3810	0.566563	0.634274	0.596934	-	-	-	-	-	-	-	NMT	No	Transformer-small (4layer) trained the BSD corpus from GitHub (20k).

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.

AMFM

#	Team	Task	Date/Time	DataID	AMFM										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	Method	Other Resources	System Description
1	goku20	BSDen-ja	2020/09/15 20:27:00	3753	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	No	mBART pre-training, doc-level single model
2	goku20	BSDen-ja	2020/09/15 20:33:23	3756	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	Yes	mBART pre-training, doc-level ensembled model, JESC parallel corpus
3	ut-mrt	BSDen-ja	2020/09/17 12:04:19	3793	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on large batches of the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0, and jParaCrawl; tuned on the full BSD corpus.
4	ut-mrt	BSDen-ja	2020/09/17 13:50:02	3803	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0
5	ut-mrt	BSDen-ja	2020/09/17 13:55:20	3805	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k)
6	ut-mrt	BSDen-ja	2020/09/17 16:51:27	3810	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	No	Transformer-small (4layer) trained the BSD corpus from GitHub (20k).
7	ut-mrt	BSDen-ja	2020/09/18 17:19:52	3920	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	SMT	No	SMT Baseline trained the BSD corpus from GitHub (20k)
8	ut-mrt	BSDen-ja	2020/09/18 17:57:22	3937	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	Yes	Transformer-small trained on the full BSD corpus (80k) with one previous context sentence
9	ut-mrt	BSDen-ja	2020/09/18 18:49:05	3942	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	Yes	Transformer-base ensemble of 2 best models trained on large batches of the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0, WMT 2020, and news corpora; tuned on the full BSD corpus.
10	DEEPNLP	BSDen-ja	2020/09/19 03:22:48	4026	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	No	Transformer model trained on BSD corpus (20k)
11	DEEPNLP	BSDen-ja	2020/09/19 03:23:53	4027	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	Yes	Training corpus is a mix of several publicly available JA-EN datasets such as JESC,KFTT,MTNT, etc. Model - Base Transformer trained and then finetuned on BSD.
12	DEEPNLP	BSDen-ja	2020/09/19 15:01:28	4049	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	No	An ensemble of two transformer models trained on BSD corpus (20k)
13	DEEPNLP	BSDen-ja	2020/09/19 15:03:40	4050	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	Yes	An ensemble of transformer models trained on several publicly available JA-EN datasets such as JESC, KFTT, MTNT, etc, and then finetuned on filtered back-translated data further finetuning on BSD.
14	ut-mrt	BSDen-ja	2020/09/19 18:55:29	4060	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	No	Transformer-small (4layer) trained the BSD corpus from GitHub (20k) with one previous context sentence. Average of four best models.
15	ut-mrt	BSDen-ja	2020/09/19 20:21:04	4068	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on data from WMT 2020 without any BSD
16	ut-mrt	BSDen-ja	2020/09/19 20:25:59	4070	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on data from WMT 2020 + the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0
17	ut-mrt	BSDen-ja	2020/09/19 20:44:13	4071	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus and one previous context sentence. Average of 4 best models.
18	ut-mrt	BSDen-ja	2020/09/19 20:49:11	4074	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus. Average of 4 best models.
19	ut-mrt	BSDen-ja	2020/09/19 20:55:04	4076	0.000000	0.000000	0.000000	-	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on document aligned news data, the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with one previous context sentence.

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.
Adequacy-Fluency Metrics (AMFM) is a two-dimensional automatic evaluation metric for machine translation, designed to operate at the sentence level. It is based on adequacy and fluency, to decouple semantic and syntactic components of the translation process to provide a balanced view on translation quality.
AMFM is calculated without tokenizers.
The detail of AMFM is shown on the following paper: "Adequacy–Fluency Metrics: Evaluating MT in the Continuous Space Model Framework" [pdf]. Invited Talk in WAT2015 also helps understanding [slide].

HUMAN (WAT2022)

Notice:

HUMAN (WAT2022) is the result of the Pairwise Crowdsourcing Evaluation on WAT2022.
HUMAN (WAT2022) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2021)

Notice:

HUMAN (WAT2021) is the result of the Pairwise Crowdsourcing Evaluation on WAT2021.
HUMAN (WAT2021) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2020)

#	Team	Task	Date/Time	DataID	HUMAN	Method	Other Resources	System Description
1	goku20	BSDen-ja	2020/09/15 20:33:23	3756	4.200	NMT	Yes	mBART pre-training, doc-level ensembled model, JESC parallel corpus
2	DEEPNLP	BSDen-ja	2020/09/19 15:03:40	4050	4.130	NMT	Yes	An ensemble of transformer models trained on several publicly available JA-EN datasets such as JESC, KFTT, MTNT, etc, and then finetuned on filtered back-translated data further finetuning on BSD.
3	ut-mrt	BSDen-ja	2020/09/19 20:49:11	4074	3.560	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus. Average of 4 best models.
4	goku20	BSDen-ja	2020/09/15 20:27:00	3753	3.550	NMT	No	mBART pre-training, doc-level single model
5	ut-mrt	BSDen-ja	2020/09/19 20:44:13	4071	3.520	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus and one previous context sentence. Average of 4 best models.
6	DEEPNLP	BSDen-ja	2020/09/19 15:01:28	4049	2.600	NMT	No	An ensemble of two transformer models trained on BSD corpus (20k)

Notice:

HUMAN (WAT2020) is the result of the Pairwise Crowdsourcing Evaluation on WAT2020.
HUMAN (WAT2020) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2019)

Notice:

HUMAN (WAT2019) is the result of the Pairwise Crowdsourcing Evaluation on WAT2019.
HUMAN (WAT2019) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2018)

Notice:

HUMAN (WAT2018) is the result of the Pairwise Crowdsourcing Evaluation on WAT2018.
HUMAN (WAT2018) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2017)

Notice:

HUMAN (WAT2017) is the result of the Pairwise Crowdsourcing Evaluation on WAT2017.
HUMAN (WAT2017) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2016)

Notice:

HUMAN (WAT2016) is the result of the Pairwise Crowdsourcing Evaluation on WAT2016.
HUMAN (WAT2016) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2015)

Notice:

HUMAN (WAT2015) is the result of the Pairwise Crowdsourcing Evaluation on WAT2015.
HUMAN (WAT2015) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.
The detail of the evaluation can be found in the pdf document (PDF file).

HUMAN (WAT2014)

Notice:

HUMAN (WAT2014) is the result of the Pairwise Crowdsourcing Evaluation on WAT2014.
HUMAN (WAT2014) was evaluated by 3 different workers and the final decision is made by the voting of the judgements.
The detail of the evaluation can be found in the pdf document (PDF file).

EVALUATION RESULTS USAGE POLICY

When you use the WAT evaluation results for any purpose such as:
- writing technical papers,
- making presentations about your system,
- advertising your MT system to the customers,
you can use the information about translation directions, scores (including both automatic and human evaluations) and ranks of your system among others. You can also use the scores of the other systems, but you MUST anonymize the other system's names. In addition, you can show the links (URLs) to the WAT evaluation result pages.

NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2018-08-02

WAT The Workshop on Asian Translation Evaluation Results

BLEU

RIBES

AMFM

HUMAN (WAT2022)

HUMAN (WAT2021)

HUMAN (WAT2020)

HUMAN (WAT2019)

HUMAN (WAT2018)

HUMAN (WAT2017)

HUMAN (WAT2016)

HUMAN (WAT2015)

HUMAN (WAT2014)

EVALUATION RESULTS USAGE POLICY

WAT

The Workshop on Asian Translation

Evaluation Results