WAT

The Workshop on Asian Translation

Evaluation Results

BLEU

#	Team	Task	Date/Time	DataID	BLEU										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	juman	kytea	mecab	moses- tokenizer	stanford- segmenter- ctb	stanford- segmenter- pku	indic- tokenizer	unuse	myseg	kmseg	Method	Other Resources	System Description
1	ut-mrt	BSDja-en	2020/09/18 19:35:10	3966	-	-	-	23.80	-	-	-	-	-	-	NMT	Yes	Transformer-base ensemble of 2 best models trained on large batches of the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 and WMT 2020 corpora; tuned on the full BSD corpus.
2	goku20	BSDja-en	2020/09/15 19:41:51	3747	-	-	-	23.15	-	-	-	-	-	-	NMT	Yes	mBART pre-training, doc-level ensembled model, JESC parallel corpus
3	DEEPNLP	BSDja-en	2020/09/19 15:00:04	4048	-	-	-	22.83	-	-	-	-	-	-	NMT	Yes	An ensemble of transformer models trained on several publicly available JA-EN datasets such as JESC, KFTT, MTNT, etc, and then finetuned on filtered back-translated data followed by finetuning on BSD.
4	ut-mrt	BSDja-en	2020/09/19 20:54:35	4075	-	-	-	21.64	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on document aligned news data, the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with one previous context sentence.
5	ut-mrt	BSDja-en	2020/09/19 20:20:28	4067	-	-	-	18.94	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on data from WMT 2020 + the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0
6	adapt-dcu	BSDja-en	2020/09/17 23:50:55	3836	-	-	-	18.70	-	-	-	-	-	-	NMT	Yes	Training corpus is a mix of OpenSubtitles, JESC and BSD. Marian-NMT toolkit used to train transformer and fine-tuned on the same corpus oversampled with BSD.
7	adapt-dcu	BSDja-en	2020/09/18 20:07:07	3970	-	-	-	18.59	-	-	-	-	-	-	NMT	Yes	Marian-NMT toolkit used to train transformer and fine-tuned on source-original synthetic corpus, as well as BSD training corpus.
8	ut-mrt	BSDja-en	2020/09/19 20:47:47	4073	-	-	-	18.57	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus. Average of 4 best models.
9	adapt-dcu	BSDja-en	2020/09/18 19:26:01	3963	-	-	-	18.53	-	-	-	-	-	-	NMT	Yes	Training corpus is a mix of OpenSubtitles, JESC and BSD. Marian-NMT toolkit used to train transformer and fine-tuned on source-original synthetic corpus.
10	ut-mrt	BSDja-en	2020/09/19 20:46:38	4072	-	-	-	18.05	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus and one previous context sentence. Average of 4 best models.
11	ut-mrt	BSDja-en	2020/09/17 13:49:46	3802	-	-	-	17.58	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0
12	adapt-dcu	BSDja-en	2020/09/18 18:35:31	3940	-	-	-	17.33	-	-	-	-	-	-	NMT	Yes	Training corpus is a mix of OpenSubtitles, JESC and BSD. Marian-NMT toolkit used to train transformer model (baseline model)
13	goku20	BSDja-en	2020/09/15 19:43:02	3748	-	-	-	17.02	-	-	-	-	-	-	NMT	No	mBART pre-training, sentence-level single model
14	ut-mrt	BSDja-en	2020/09/19 20:25:23	4069	-	-	-	16.99	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on data from WMT 2020 without any BSD
15	ut-mrt	BSDja-en	2020/09/17 13:55:01	3804	-	-	-	15.83	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k)
16	ut-mrt	BSDja-en	2020/09/18 17:57:02	3936	-	-	-	14.49	-	-	-	-	-	-	NMT	Yes	Transformer-small trained on the full BSD corpus (80k) with one previous context sentence
17	DEEPNLP	BSDja-en	2020/09/19 14:57:57	4047	-	-	-	10.91	-	-	-	-	-	-	NMT	No	An ensemble of two transformer models trained on BSD corpus (20k)
18	DEEPNLP	BSDja-en	2020/09/17 16:21:53	3808	-	-	-	9.70	-	-	-	-	-	-	NMT	No	Transformer-base model trained on BSD corpus (20k).
19	ut-mrt	BSDja-en	2020/09/19 19:06:00	4065	-	-	-	7.67	-	-	-	-	-	-	NMT	No	Transformer-small (4layer) trained the BSD corpus from GitHub (20k). Average of four best models.
20	DEEPNLP	BSDja-en	2020/10/16 18:03:12	4162	-	-	-	7.47	-	-	-	-	-	-	NMT	No
21	ut-mrt	BSDja-en	2020/09/19 18:55:02	4059	-	-	-	7.43	-	-	-	-	-	-	NMT	No	Transformer-small (4layer) trained the BSD corpus from GitHub (20k) with one previous context sentence. Average of four best models.
22	ut-mrt	BSDja-en	2020/09/18 17:19:26	3919	-	-	-	6.88	-	-	-	-	-	-	SMT	No	SMT Baseline trained the BSD corpus from GitHub (20k)
23	DEEPNLP	BSDja-en	2020/09/17 15:58:22	3806	-	-	-	6.27	-	-	-	-	-	-	NMT	No	Transformer-base model trained on BSD corpus (20k).

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.

RIBES

#	Team	Task	Date/Time	DataID	RIBES										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	juman	kytea	mecab	moses- tokenizer	stanford- segmenter- ctb	stanford- segmenter- pku	indic- tokenizer	unuse	myseg	kmseg	Method	Other Resources	System Description
1	goku20	BSDja-en	2020/09/15 19:41:51	3747	-	-	-	0.755099	-	-	-	-	-	-	NMT	Yes	mBART pre-training, doc-level ensembled model, JESC parallel corpus
2	DEEPNLP	BSDja-en	2020/09/19 15:00:04	4048	-	-	-	0.752619	-	-	-	-	-	-	NMT	Yes	An ensemble of transformer models trained on several publicly available JA-EN datasets such as JESC, KFTT, MTNT, etc, and then finetuned on filtered back-translated data followed by finetuning on BSD.
3	ut-mrt	BSDja-en	2020/09/18 19:35:10	3966	-	-	-	0.746855	-	-	-	-	-	-	NMT	Yes	Transformer-base ensemble of 2 best models trained on large batches of the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 and WMT 2020 corpora; tuned on the full BSD corpus.
4	ut-mrt	BSDja-en	2020/09/19 20:54:35	4075	-	-	-	0.746032	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on document aligned news data, the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with one previous context sentence.
5	adapt-dcu	BSDja-en	2020/09/18 20:07:07	3970	-	-	-	0.732256	-	-	-	-	-	-	NMT	Yes	Marian-NMT toolkit used to train transformer and fine-tuned on source-original synthetic corpus, as well as BSD training corpus.
6	adapt-dcu	BSDja-en	2020/09/18 19:26:01	3963	-	-	-	0.731639	-	-	-	-	-	-	NMT	Yes	Training corpus is a mix of OpenSubtitles, JESC and BSD. Marian-NMT toolkit used to train transformer and fine-tuned on source-original synthetic corpus.
7	adapt-dcu	BSDja-en	2020/09/17 23:50:55	3836	-	-	-	0.730460	-	-	-	-	-	-	NMT	Yes	Training corpus is a mix of OpenSubtitles, JESC and BSD. Marian-NMT toolkit used to train transformer and fine-tuned on the same corpus oversampled with BSD.
8	ut-mrt	BSDja-en	2020/09/19 20:46:38	4072	-	-	-	0.723202	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus and one previous context sentence. Average of 4 best models.
9	ut-mrt	BSDja-en	2020/09/19 20:47:47	4073	-	-	-	0.720809	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus. Average of 4 best models.
10	adapt-dcu	BSDja-en	2020/09/18 18:35:31	3940	-	-	-	0.714268	-	-	-	-	-	-	NMT	Yes	Training corpus is a mix of OpenSubtitles, JESC and BSD. Marian-NMT toolkit used to train transformer model (baseline model)
11	ut-mrt	BSDja-en	2020/09/17 13:49:46	3802	-	-	-	0.710701	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0
12	ut-mrt	BSDja-en	2020/09/17 13:55:01	3804	-	-	-	0.699950	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k)
13	ut-mrt	BSDja-en	2020/09/19 20:20:28	4067	-	-	-	0.698102	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on data from WMT 2020 + the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0
14	ut-mrt	BSDja-en	2020/09/18 17:57:02	3936	-	-	-	0.697037	-	-	-	-	-	-	NMT	Yes	Transformer-small trained on the full BSD corpus (80k) with one previous context sentence
15	goku20	BSDja-en	2020/09/15 19:43:02	3748	-	-	-	0.688501	-	-	-	-	-	-	NMT	No	mBART pre-training, sentence-level single model
16	ut-mrt	BSDja-en	2020/09/19 20:25:23	4069	-	-	-	0.677070	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on data from WMT 2020 without any BSD
17	DEEPNLP	BSDja-en	2020/09/19 14:57:57	4047	-	-	-	0.615230	-	-	-	-	-	-	NMT	No	An ensemble of two transformer models trained on BSD corpus (20k)
18	DEEPNLP	BSDja-en	2020/09/17 16:21:53	3808	-	-	-	0.606886	-	-	-	-	-	-	NMT	No	Transformer-base model trained on BSD corpus (20k).
19	DEEPNLP	BSDja-en	2020/10/16 18:03:12	4162	-	-	-	0.587745	-	-	-	-	-	-	NMT	No
20	ut-mrt	BSDja-en	2020/09/19 19:06:00	4065	-	-	-	0.580769	-	-	-	-	-	-	NMT	No	Transformer-small (4layer) trained the BSD corpus from GitHub (20k). Average of four best models.
21	ut-mrt	BSDja-en	2020/09/19 18:55:02	4059	-	-	-	0.576275	-	-	-	-	-	-	NMT	No	Transformer-small (4layer) trained the BSD corpus from GitHub (20k) with one previous context sentence. Average of four best models.
22	ut-mrt	BSDja-en	2020/09/18 17:19:26	3919	-	-	-	0.575330	-	-	-	-	-	-	SMT	No	SMT Baseline trained the BSD corpus from GitHub (20k)
23	DEEPNLP	BSDja-en	2020/09/17 15:58:22	3806	-	-	-	0.531230	-	-	-	-	-	-	NMT	No	Transformer-base model trained on BSD corpus (20k).

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.

AMFM

#	Team	Task	Date/Time	DataID	AMFM										Method	Other Resources	System Description
#	Team	Task	Date/Time	DataID	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	unuse	Method	Other Resources	System Description
1	goku20	BSDja-en	2020/09/15 19:41:51	3747	-	-	-	0.000000	-	-	-	-	-	-	NMT	Yes	mBART pre-training, doc-level ensembled model, JESC parallel corpus
2	goku20	BSDja-en	2020/09/15 19:43:02	3748	-	-	-	0.000000	-	-	-	-	-	-	NMT	No	mBART pre-training, sentence-level single model
3	ut-mrt	BSDja-en	2020/09/17 13:49:46	3802	-	-	-	0.000000	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0
4	ut-mrt	BSDja-en	2020/09/17 13:55:01	3804	-	-	-	0.000000	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k)
5	DEEPNLP	BSDja-en	2020/09/17 15:58:22	3806	-	-	-	0.000000	-	-	-	-	-	-	NMT	No	Transformer-base model trained on BSD corpus (20k).
6	DEEPNLP	BSDja-en	2020/09/17 16:21:53	3808	-	-	-	0.000000	-	-	-	-	-	-	NMT	No	Transformer-base model trained on BSD corpus (20k).
7	adapt-dcu	BSDja-en	2020/09/17 23:50:55	3836	-	-	-	0.000000	-	-	-	-	-	-	NMT	Yes	Training corpus is a mix of OpenSubtitles, JESC and BSD. Marian-NMT toolkit used to train transformer and fine-tuned on the same corpus oversampled with BSD.
8	ut-mrt	BSDja-en	2020/09/18 17:19:26	3919	-	-	-	0.000000	-	-	-	-	-	-	SMT	No	SMT Baseline trained the BSD corpus from GitHub (20k)
9	ut-mrt	BSDja-en	2020/09/18 17:57:02	3936	-	-	-	0.000000	-	-	-	-	-	-	NMT	Yes	Transformer-small trained on the full BSD corpus (80k) with one previous context sentence
10	adapt-dcu	BSDja-en	2020/09/18 18:35:31	3940	-	-	-	0.000000	-	-	-	-	-	-	NMT	Yes	Training corpus is a mix of OpenSubtitles, JESC and BSD. Marian-NMT toolkit used to train transformer model (baseline model)
11	adapt-dcu	BSDja-en	2020/09/18 19:26:01	3963	-	-	-	0.000000	-	-	-	-	-	-	NMT	Yes	Training corpus is a mix of OpenSubtitles, JESC and BSD. Marian-NMT toolkit used to train transformer and fine-tuned on source-original synthetic corpus.
12	ut-mrt	BSDja-en	2020/09/18 19:35:10	3966	-	-	-	0.000000	-	-	-	-	-	-	NMT	Yes	Transformer-base ensemble of 2 best models trained on large batches of the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 and WMT 2020 corpora; tuned on the full BSD corpus.
13	adapt-dcu	BSDja-en	2020/09/18 20:07:07	3970	-	-	-	0.000000	-	-	-	-	-	-	NMT	Yes	Marian-NMT toolkit used to train transformer and fine-tuned on source-original synthetic corpus, as well as BSD training corpus.
14	DEEPNLP	BSDja-en	2020/09/19 14:57:57	4047	-	-	-	0.000000	-	-	-	-	-	-	NMT	No	An ensemble of two transformer models trained on BSD corpus (20k)
15	DEEPNLP	BSDja-en	2020/09/19 15:00:04	4048	-	-	-	0.000000	-	-	-	-	-	-	NMT	Yes	An ensemble of transformer models trained on several publicly available JA-EN datasets such as JESC, KFTT, MTNT, etc, and then finetuned on filtered back-translated data followed by finetuning on BSD.
16	ut-mrt	BSDja-en	2020/09/19 18:55:02	4059	-	-	-	0.000000	-	-	-	-	-	-	NMT	No	Transformer-small (4layer) trained the BSD corpus from GitHub (20k) with one previous context sentence. Average of four best models.
17	ut-mrt	BSDja-en	2020/09/19 19:06:00	4065	-	-	-	0.000000	-	-	-	-	-	-	NMT	No	Transformer-small (4layer) trained the BSD corpus from GitHub (20k). Average of four best models.
18	ut-mrt	BSDja-en	2020/09/19 20:20:28	4067	-	-	-	0.000000	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on data from WMT 2020 + the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0
19	ut-mrt	BSDja-en	2020/09/19 20:25:23	4069	-	-	-	0.000000	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on data from WMT 2020 without any BSD
20	ut-mrt	BSDja-en	2020/09/19 20:46:38	4072	-	-	-	0.000000	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus and one previous context sentence. Average of 4 best models.
21	ut-mrt	BSDja-en	2020/09/19 20:47:47	4073	-	-	-	0.000000	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus. Average of 4 best models.
22	ut-mrt	BSDja-en	2020/09/19 20:54:35	4075	-	-	-	0.000000	-	-	-	-	-	-	NMT	Yes	Transformer-base trained on document aligned news data, the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with one previous context sentence.
23	DEEPNLP	BSDja-en	2020/10/16 18:03:12	4162	-	-	-	0.000000	-	-	-	-	-	-	NMT	No

Notice:

This table is sorted by the leftmost segmenters. You can change the segmenter used to sort by clicking each segmenter link.
Adequacy-Fluency Metrics (AMFM) is a two-dimensional automatic evaluation metric for machine translation, designed to operate at the sentence level. It is based on adequacy and fluency, to decouple semantic and syntactic components of the translation process to provide a balanced view on translation quality.
AMFM is calculated without tokenizers.
The detail of AMFM is shown on the following paper: "Adequacy–Fluency Metrics: Evaluating MT in the Continuous Space Model Framework" [pdf]. Invited Talk in WAT2015 also helps understanding [slide].

HUMAN (WAT2022)

Notice:

HUMAN (WAT2022) is the result of the Pairwise Crowdsourcing Evaluation on WAT2022.
HUMAN (WAT2022) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2021)

Notice:

HUMAN (WAT2021) is the result of the Pairwise Crowdsourcing Evaluation on WAT2021.
HUMAN (WAT2021) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2020)

#	Team	Task	Date/Time	DataID	HUMAN	Method	Other Resources	System Description
1	goku20	BSDja-en	2020/09/15 19:41:51	3747	4.190	NMT	Yes	mBART pre-training, doc-level ensembled model, JESC parallel corpus
2	DEEPNLP	BSDja-en	2020/09/19 15:00:04	4048	4.100	NMT	Yes	An ensemble of transformer models trained on several publicly available JA-EN datasets such as JESC, KFTT, MTNT, etc, and then finetuned on filtered back-translated data followed by finetuning on BSD.
3	adapt-dcu	BSDja-en	2020/09/17 23:50:55	3836	3.930	NMT	Yes	Training corpus is a mix of OpenSubtitles, JESC and BSD. Marian-NMT toolkit used to train transformer and fine-tuned on the same corpus oversampled with BSD.
4	ut-mrt	BSDja-en	2020/09/19 20:47:47	4073	3.620	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus. Average of 4 best models.
5	goku20	BSDja-en	2020/09/15 19:43:02	3748	3.570	NMT	No	mBART pre-training, sentence-level single model
6	ut-mrt	BSDja-en	2020/09/19 20:46:38	4072	3.550	NMT	Yes	Transformer-base trained on the full BSD corpus (80k), AMI meeting corpus, Ontonotes 5.0 with domain tags to separate each corpus and one previous context sentence. Average of 4 best models.
7	DEEPNLP	BSDja-en	2020/09/19 14:57:57	4047	2.400	NMT	No	An ensemble of two transformer models trained on BSD corpus (20k)

Notice:

HUMAN (WAT2020) is the result of the Pairwise Crowdsourcing Evaluation on WAT2020.
HUMAN (WAT2020) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2019)

Notice:

HUMAN (WAT2019) is the result of the Pairwise Crowdsourcing Evaluation on WAT2019.
HUMAN (WAT2019) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2018)

Notice:

HUMAN (WAT2018) is the result of the Pairwise Crowdsourcing Evaluation on WAT2018.
HUMAN (WAT2018) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2017)

Notice:

HUMAN (WAT2017) is the result of the Pairwise Crowdsourcing Evaluation on WAT2017.
HUMAN (WAT2017) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2016)

Notice:

HUMAN (WAT2016) is the result of the Pairwise Crowdsourcing Evaluation on WAT2016.
HUMAN (WAT2016) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.

HUMAN (WAT2015)

Notice:

HUMAN (WAT2015) is the result of the Pairwise Crowdsourcing Evaluation on WAT2015.
HUMAN (WAT2015) was evaluated by 5 different workers and the final decision is made by the voting of the judgements.
The detail of the evaluation can be found in the pdf document (PDF file).

HUMAN (WAT2014)

Notice:

HUMAN (WAT2014) is the result of the Pairwise Crowdsourcing Evaluation on WAT2014.
HUMAN (WAT2014) was evaluated by 3 different workers and the final decision is made by the voting of the judgements.
The detail of the evaluation can be found in the pdf document (PDF file).

EVALUATION RESULTS USAGE POLICY

When you use the WAT evaluation results for any purpose such as:
- writing technical papers,
- making presentations about your system,
- advertising your MT system to the customers,
you can use the information about translation directions, scores (including both automatic and human evaluations) and ranks of your system among others. You can also use the scores of the other systems, but you MUST anonymize the other system's names. In addition, you can show the links (URLs) to the WAT evaluation result pages.

NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2018-08-02

WAT The Workshop on Asian Translation Evaluation Results

BLEU

RIBES

AMFM

HUMAN (WAT2022)

HUMAN (WAT2021)

HUMAN (WAT2020)

HUMAN (WAT2019)

HUMAN (WAT2018)

HUMAN (WAT2017)

HUMAN (WAT2016)

HUMAN (WAT2015)

HUMAN (WAT2014)

EVALUATION RESULTS USAGE POLICY

WAT

The Workshop on Asian Translation

Evaluation Results