WAT 2017

The 4th Workshop on Asian Translation

November 27, 2017
Taipei

PHOTOS are now open.
New subtasks: Newswire subtask (Ja <-> En) and Recipe subtask (Ja <-> En) (2017/06/17)

INTRODUCTION

Many Asian countries are rapidly growing these days and the importance of communicating and exchanging the information with these countries has intensified. To satisfy the demand for communication among these countries,machine translation technology is essential.

Machine translation technology has rapidly evolved recently and it is seeing practical use especially between European languages. However, the translation quality of Asian languages is not that high compared to that of European languages, and machine translation technology for these languages has not reached a stage of proliferation yet. This is not only due to the lack of the language resources for Asian languages but also due to the lack of techniques to correctly transfer the meaning of sentences from/to Asian languages. Consequently, a place for gathering and sharing the resources and knowledge about Asian language translation is necessary to enhance machine translation research for Asian languages.

The Workshop on Machine Translation (WMT), the world's largest machine translation workshop, mainly targets on European languages and does not include Asian languages. The International Workshop on Spoken Language Translation (IWSLT) has spoken language translation tasks for some Asian languages using TED talk data, but these is no task for written language.

The Workshop on Asian Translation (WAT) is an open machine translation evaluation campaign focusing on Asian languages. WAT gathers and shares the resources and knowledge of Asian language translation to understand the problems to be solved for the practical use of machine translation technologies among all Asian countries. WAT is unique in that it is an "open innovation platform": the test data is fixed and open, so participants can repeat evaluations on the same data and confirm changes in translation accuracy over time. WAT has no deadline for the automatic translation quality evaluation (continuous evaluation), so participants can submit translation results at any time.

Following the success of the previous WAT workshops (WAT2014, WAT2015, WAT2016), WAT2017 will bring together machine translation researchers and users to try, evaluate, share and discuss brand-new ideas about machine translation. For the 4th WAT, we will include two new translation subtasks:

Japanese-English newspaper translation subtask
Japanese-English recipe translation subtask

Also, we will include brand-new task: small NMT task. The goal of this task is to build a small neural machine translation system while keeping a reasonable translation quality. There is a high demand in industries to equip smart devices with translation capabilities. Though neural machine translation reaches the point that such capability is not a dream anymore, it usually needs huge resources which are not available on daily devices. The current solution is to run a translation engine on powerful servers and to arrange the device talk to them over Internet. However reliable low-latency connection is not available in the most part of the world and will not in a short term. If we can build a small system while keeping the translation capability reasonably, it has a huge impact in the application of machine translation.

Unfortunately almost all research work of neural machine translation is biased toward improving quality with little consideration to computing resource at inference time. We hope this shared task provides a common language and asset to the NLP community to open a new research field, which will have a huge impact in cross-language communication of our society.

In addition to the shared tasks, the workshop will also feature scientific papers on topics related to the machine translation, especially for Asian languages. Topics of interest include, but are not limited to:

analysis of the automatic/human evaluation results in the past WAT workshops
word-/phrase-/syntax-/semantics-/rule-based, neural and hybrid machine translation
Asian language processing
incorporating linguistic information into machine translation
decoding algorithms
system combination
error analysis
manual and automatic machine translation evaluation
machine translation applications
quality estimation
domain adaptation
machine translation for low resource languages
language resources

SunFlare Co., Ltd.

TIMETABLE

9:00 - 9:20	Welcome & Overview
	Overview of the 4th Workshop on Asian Translation
	Toshiaki Nakazawa, Shohei Higashiyama, Chenchen Ding, Hideya Mino, Isao Goto, Hideto Kazawa, Yusuke Oda, Graham Neubig and Sadao Kurohashi
9:20 - 10:40	Research Paper
	Controlling Target Features in Neural Machine Translation via Prefix Constraints
	Shunsuke Takeno, Masaaki Nagata and Kazuhide Yamamoto
	Improving Japanese-to-English Neural Machine Translation by Paraphrasing the Target Language
	Yuuki Sekizawa, Tomoyuki Kajiwara and Mamoru Komachi
	Improving Low-Resource Neural Machine Translation with Filtered Pseudo-Parallel Corpus
	Aizhan Imankulova, Takayuki Sato and Mamoru Komachi
	Japanese to English/Chinese/Korean Datasets for Translation Quality Estimation and Automatic Post-Editing
	Atsushi Fujita and Eiichiro Sumita
10:40 - 11:00	Coffee Break
11:00 - 12:00	System Description
	NTT Neural Machine Translation Systems at WAT 2017
	Makoto Morishita, Jun Suzuki and Masaaki Nagata
	XMU Neural Machine Translation Systems for WAT 2017
	Boli Wang, Zhixing Tan, Jinming Hu, Yidong Chen and Xiaodong Shi
	A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size
	Masato Neishi, Jin Sakuma, Satoshi Tohda, Shonosuke Ishiwatari, Naoki Yoshinaga and Masashi Toyoda
12:00 - 14:00	Lunch
14:00 - 14:45	Invited Talk:
	Turning NMT research into commercial products
	Dr. Adrià de Gispert
14:45 - 15:05	Poster Booster (9 papers)
15:05 - 15:10	Commemorative Photo
15:10 - 15:30	Coffee Break
15:30 - 16:15	Poster Presentation I
	NTT Neural Machine Translation Systems at WAT 2017
	Makoto Morishita, Jun Suzuki and Masaaki Nagata
	Patent NMT integrated with Large Vocabulary Phrase Translation by SMT at WAT 2017
	Zi Long, Ryuichiro Kimura, Takehito Utsuro, Tomoharu Mitsuhashi and Mikio Yamamoto
	SMT reranked NMT
	Terumasa Ehara
	Ensemble and Reranking: Using Multiple Models in the NICT-2 Neural Machine Translation System at WAT2017
	Kenji Imamura and Eiichiro Sumita
	A Simple and Strong Baseline: NAIST-NICT Neural Machine Translation System for WAT2017 English-Japanese Translation Task
	Yusuke Oda, Katsuhito Sudoh, Satoshi Nakamura, Masao Utiyama and Eiichiro Sumita
	Comparison of SMT and NMT trained with large Patent Corpora: Japio at WAT2017
	Satoshi Kinoshita, Tadaaki Oshio and Tomoharu Mitsuhashi
16:15 - 17:00	Poster Presentation II
	A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size
	Masato Neishi, Jin Sakuma, Satoshi Tohda, Shonosuke Ishiwatari, Naoki Yoshinaga and Masashi Toyoda
	Kyoto University Participation to WAT 2017
	Fabien Cromieres, Raj Dabre, Toshiaki Nakazawa and Sadao Kurohashi
	CUNI NMT System for WAT 2017 Translation Tasks
	Tom Kocmi, Dušan Variš and Ondřej Bojar
	XMU Neural Machine Translation Systems for WAT 2017
	Boli Wang, Zhixing Tan, Jinming Hu, Yidong Chen and Xiaodong Shi
	Tokyo Metropolitan University Neural Machine Translation System for WAT 2017
	Yukio Matsumura and Mamoru Komachi
	Comparing Recurrent and Convolutional Architectures for English-Hindi Neural Machine Translation
	Sandhya Singh, Ritesh Panjwani, Anoop Kunchukuttan and Pushpak Bhattacharyya
17:00 -	Closing

INVITED TALK

Dr. Adrià de Gispert
Senior research scientist at SDL Research
senior research associate at the Engineering Department in the University of Cambridge
[Short bio.]

Title: Turning NMT research into commercial products

(Click here to more information. )

Time: 14:00 - 14:45

IMPORTANT DATES

Translation Task Submission Deadline	July 31, 2017
Small NMT Task Submission Deadline	August 20, 2017
Research Paper and System Description Submission Deadline	September 5, 2017
Notification of Research Paper Acceptance and Review Feedback of System Description	September 30, 2017
Camera-ready Deadline	October 10, 2017

* All deadlines are calculated at 11:59PM UTC-7

TRANSLATION TASK

The task is to improve the text translation quality for Asian languages including Japanese, Chinese, Korean, Hindi, Indonesian and English. Participants choose any of the subtasks in which they would like to participate and translate the test data using their machine translation systems. The organizers will evaluate the submissions using automatic evaluation and human evaluation. We also provide a baseline machine translation system using Moses.

Subtasks:

Scientific papers subtasks: Asian Scientific Paper Excerpt Corpus (ASPEC)
- English <--> Japanese
- Chinese <--> Japanese
Patents subtasks: Japan Patent Office Patent Corpus (JPC)
- Chinese <--> Japanese
- Korean <--> Japanese
- English <--> Japanese
Newswire subtasks:
- ~~Indonesian <--> English (Badan Pengkajian dan Penerapan Teknologi (BPPT) Corpus)~~
- Japanese <--> English (NEW!! JIJI Corpus)
Mixed domain subtasks: IIT Bombay (IITB) Corpus
- Hindi <--> English
- Hindi <--> Japanese
Recipe subtask (NEW!!): Cookpad Comparable Corpus
- Japanese <--> English

Dataset:

Scientific paper subtasks:
WAT uses ASPEC for the dataset including training, development, development test and test data. Participants of the scientific papers subtask must get a copy of ASPEC by themselves. ASPEC consists of approximately 3 million Japanese-English parallel sentences from paper abstracts (ASPEC-JE) and approximately 0.7 million Japanese-Chinese paper excerpts (ASPEC-JC)
Patent subtasks:
WAT uses JPO Patent Corpus, which is constructed by Japan Patent Office (JPO). This corpus consists of 1 million English-Japanese parallel sentences, 1 million Chinese-Japanese parallel sentences, and 1 million Korean-Japanese parallel sentences from patent description with four categories. Participants of patents subtask are required to get it on WAT2017 site of JPO Patent Corpus.
Newswire subtasks:
- NEW!!! Japanese <--> English:
  WAT uses JIJI Corpus, which is constructed by Jiji Press Ltd. in collaboration with the National Institute of Information and Communications Technology (NICT). This corpus consists of a Japanese-English news corpus of 200K parallel sentences, from Jiji Press news with various categories. Participants of patents subtask are required to get it on WAT2017 site of JIJI Corpus.
Mixed domain subtasks:
- Hindi <--> English
  WAT uses IITB Corpus for the dataset for training, development, development test and test data. The training corpus is mixed domain and contains around 1 million lines of sentences and phrases. In order to access the corpus participants should sign the following agreement, scan and send it to the addresss mentioned in it. The training corpus is a mixed domain corpus. The development and test set are from the News domain and are exactly the same as the ones in WMT 2014.
- Hindi <--> Japanese Pivot Language Task
  For the first time we are introducing a pivot language task. For this tasks participants can use the following corpora.
  1. A parallel corpus (created using openly available corpora) which is located at here.
  2. The Hindi-English (IITB) task corpus and the English-Japanese (ASPEC) task corpus for pivoting. For triangulation of the source-pivot and pivot-target phrase tables they can use the scripts provided by: MultiMT .
  The objective of this task is to compare the performance of a baseline system constructed only on a mixed domain parallel corpus with a system that uses additional mixed domain corpus by means of pivoting.
NEW!!! Recipe subtask:
WAT uses Recipe Corpus, which is constructed by Cookpad Inc. This corpus consists of 16,282 Japanese-English parallel sentences from recipes. Participants of recipe subtask are required to get it on WAT2017 site of Recipe Corpus.

TRANSLATION TASK EVALUATION

We will evaluate the translation performance of the results submitted through automatic evaluation and human evaluation.

Automatic evaluation:
We will prepare an automatic evaluation server. You will be able to evaluate the translation results at any time using this server.

Metric:
BLEU, RIBES, and AM-FM^*
* A two-dimensional automatic evaluation metric for machine translation, designed to operate at the sentence level.
* It is based on adequacy and fluency, to decouple semantic and syntactic components of the translation process to provide a balanced view on translation quality.
Format:
The submission format and the submission method are given at the submission site below.
Notice:
When submitting, participants have to agree that the submitted results are attributed to JST and NICT. The results will be used and distributed for research by JST and NICT.
Thanks to the technical collaborators: Luis Fernando D'Haro, Rafael E. Banchs and Haizhou Li.

Human evaluation:
Human evaluation will be carried out with two kinds of method, which are Pairwise Crowdsourcing Evaluation and JPO Adequacy Evaluation.

Pairwise Crowdsourcing Evaluation:
Pairwise Crowdsourcing Evaluation will be carried out using crowdsourcing. Organizers will sample 400 sentences from the test data for the pairwise crowdsourcing evaluation. Participants can submit translation results for the human evaluation a maximum of twice until the pairwise crowdsourcing evaluation is due. (For automatic evaluation, there is no limitation on submitting times.)
- Metric:
  Sentence-by-sentence pair-wise evaluation compared to the baseline system. The crowdsourcing workers will be asked to judge which translation is better than the other in view of adequacy and fluency. To guarantee the quality of the evaluation, each sentence is evaluated by 5 different workers and the final decision is made by the voting of the judgements.
- Format:
  The submission format is the same as that of automatic evaluation. Participants can select their translation results from the ones submitted for automatic evaluation.
- Ranking:
  All systems will be ranked by the percentage of translations judged to improve upon the baseline system.

JPO Adequacy Evaluation:
We will also evaluate with the criteria of Content Transmission Level Evaluation which JPO defined (pages 5 to 8 in the PDF file (Japanese Page)). We will sample 200 sentences from the pairwise crowdsourcing evaluation data for the JPO adequacy evaluation. The JPO adequacy evaluation will be conducted only for translation results of 3 top-scored teams on the Pairwise Crowdsourcing Evaluation in each subtask's language pair.

Submission:

Submission site is now open.
(User Name and Password is necessary to access.)

Evaluation results:

Evaluation results site is now open.

SMALL NMT TASK (NEW!!)

See this page.

PAPER SUBMISSION INFORMATION

Participants who submit results for human evaluation are required to submit description papers of their translation systems and evaluation results. All submissions and feedback are handled electronically as below.

Format and Template:
Participants must use the same format as IJCNLP 2017 LONG PAPER and follow the same instructions in terms of the format.
We ask you to use the IJCNLP 2017 LaTeX style files or Microsoft Word template that are available on the IJCNLP 2017 conference web site.
Submission site:
https://www.softconf.com/ijcnlp2017/wat2017

TRANSLATION RESULT SUBMISSION INFORMATION

The applicating site for task participants of WAT2017 is now open.

REGISTRATION

TBA

ORGANIZERS

Toshiaki Nakazawa, Japan Science and Technology Agency (JST), Japan
Hideya Mino, National Institute of Information and Communications Technology (NICT), Japan
Chenchen Ding, National Institute of Information and Communications Technology (NICT), Japan
Shohei Higashiyama, National Institute of Information and Communications Technology (NICT), Japan
Isao Goto, Japan Broadcasting Corporation (NHK), Japan
Graham Neubig, Carnegie Mellon University (CMU), USA
Hideto Kazawa, Google, Japan
Yusuke Oda, Nara Institute of Science and Technology (NAIST), Japan
Jun Harashima, Cookpad Inc., Japan
Sadao Kurohashi, Kyoto University, Japan
Ir. Hammam Riza, Agency for the Assessment and Application of Technology (BPPT), Indonesia
Pushpak Bhattacharyya, Indian Institute of Technology Bombay (IIT), India

PROGRAM COMMITTEE MEMBERS

Temporarily

Chenhui Chu, JST, Japan
Fabien Cromières, JST, Japan
Hideto Kazawa, Google, Japan
Anoop Kunchookuttan, IIT Bombay, India
Qun Liu, Dublin City University, Ireland
Yvette Graham, Dublin City University, Ireland
Yang Liu, Tsinghua University, China
Liling Tan, Universität des Saarlandes, Germany
Masao Utiyama, NICT, Japan
Jiajun Zhang, Chinese Academy of Sciences, China

TECHNICAL COLLABORATOR

Luis Fernando D'Haro, Institute for Infocomm Research, Singapore
Rafael E. Banchs, Institute for Infocomm Research, Singapore
Haizhou Li, Institute for Infocomm Research, Singapore

CONTACT

For questions, comments, etc. please email to "wat -at- nlp -dot- ist -dot- i -dot- kyoto -hyphen- u -dot- ac -dot- jp".

Japan Patent Office
JPO Patent Corpus
The Association for Natural Language Processing (Japanese Page)
Asian Scientific Paper Excerpt Corpus (ASPEC)
Japan Science and Technology Agency (JST)
National Institute of Information and Communications Technology (NICT)

PREVIOUS WORKSHOPS

WAT 2016
The 3rd Workshop on Asian Translation (WAT 2016) was held in December 2016 in Osaka, Japan.
WAT 2015
The 2nd Workshop on Asian Translation (WAT 2015) was held in October 2015 in Kyoto, Japan.
WAT 2014
The 1st Workshop on Asian Translation (WAT 2014) was held in October 2014 in Tokyo, Japan.

CHANGE LOG

2017-08-28: paper submission link updated
2017-08-21: recipe task related information updated
2017-07-21: baseline systems updated
2017-07-21: discription of recipe subtask updated
2017-06-23: important dates updated
2017-06-23: application information updated
2017-06-16: corpus information updated
2017-05-10: main information updated
2017-04-26: organizer list updated
2017-04-13: site open

JST (Japan Science and Technology Agency)
NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2017-08-28

WAT 2017 The 4th Workshop on Asian Translation

November 27, 2017 Taipei