WAT 2019

The 6th Workshop on Asian Translation

November 4, 2019
Hong Kong, China

NOTICE: The date of WAT2019 has been changed from 3rd to 4th of November.

The submission, evaluation, and baseline pages have been opened.

The policy for participants has been added.

INTRODUCTION

Many Asian countries are rapidly growing these days and the importance of communicating and exchanging the information with these countries has intensified. To satisfy the demand for communication among these countries, machine translation technology is essential.

Machine translation technology has rapidly evolved recently and it is seeing practical use especially between European languages. However, the translation quality of Asian languages is not that high compared to that of European languages, and machine translation technology for these languages has not reached a stage of proliferation yet. This is not only due to the lack of the language resources for Asian languages but also due to the lack of techniques to correctly transfer the meaning of sentences from/to Asian languages. Consequently, a place for gathering and sharing the resources and knowledge about Asian language translation is necessary to enhance machine translation research for Asian languages.

The Workshop on Machine Translation (WMT), the world's largest machine translation workshop, mainly targets on European languages and does not include Asian languages. The International Workshop on Spoken Language Translation (IWSLT) has spoken language translation tasks for some Asian languages using TED talk data, but these is no task for written language.

The Workshop on Asian Translation (WAT) is an open machine translation evaluation campaign focusing on Asian languages. WAT gathers and shares the resources and knowledge of Asian language translation to understand the problems to be solved for the practical use of machine translation technologies among all Asian countries. WAT is unique in that it is an "open innovation platform": the test data is fixed and open, so participants can repeat evaluations on the same data and confirm changes in translation accuracy over time. WAT has no deadline for the automatic translation quality evaluation (continuous evaluation), so participants can submit translation results at any time.

Following the success of the previous WAT workshops (WAT2014 -- WAT2018), WAT2019 will bring together machine translation researchers and users to try, evaluate, share and discuss brand-new ideas about machine translation. For the 6th WAT, we will include 5 new translation subtasks:

Japanese --> English timely disclosure documents task
Khmer <--> English Mixed-domain task
Tamil <--> English Mixed-domain task
Russian <--> Japanese News Commentary task
English --> Hindi multimodal task

In addition to the shared tasks, the workshop will also feature scientific papers on topics related to the machine translation, especially for Asian languages. Topics of interest include, but are not limited to:

analysis of the automatic/human evaluation results in the past WAT workshops
word-/phrase-/syntax-/semantics-/rule-based, neural and hybrid machine translation
Asian language processing
incorporating linguistic information into machine translation
decoding algorithms
system combination
error analysis
manual and automatic machine translation evaluation
machine translation applications
quality estimation
domain adaptation
machine translation for low resource languages
language resources

SunFlare Co., Ltd.

Japan Exchange Group, Inc. (JPX)

AAMT

Kawamura International

INVITED TALK

Dr. Desmond Elliott
Assistant Professor at University of Copenhagen
[Short Bio.]

Title: Multitask Learning from Multilingual Mutimodal Data
[Abstract] [Slides]

IMPORTANT DATES

Translation Task Submission Deadline	July 26, 2019 (EXTENDED!!)
Research Paper Submission Deadline	August 19, 2019
System Description Paper Submission Deadline	September 13, 2019
Notification of Acceptance for Research Papers	September 17, 2019
Review Feedback of System Description Papers	September 20, 2019
Camera-ready Deadline (both Research and System Description Papers)	September 30, 2019
Workshop Dates	November 3-4, 2019

* All deadlines are calculated at 11:59PM UTC-7

TIMETABLE

09:00 - 09:30	Welcome & Overview
	Overview of the 6th Workshop on Asian Translation
	Toshiaki Nakazawa, Nobushige Doi, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar and Sadao Kurohashi
09:30 - 10:30	Research Paper I
	09:30 - 09:50 : Compact and Robust Models for Japanese-English Character-level Machine Translation
	Jinan Dai and Kazunori Yamaguchi
	09:50 - 10:10 : Controlling Japanese Honorifics in English-to-Japanese Neural Machine Translation
	Weston Feely, Eva Hasler and Adrià de Gispert
	10:10 - 10:30 : Designing the Business Conversation Corpus
	Matīss Rikters, Ryokan Ri, Tong Li and Toshiaki Nakazawa
10:30 - 10:50	Poster Booster I
10:50 - 12:30	System Desccription Paper (Poster) I
	- English to Hindi Multi-modal Neural Machine Translation and Hindi Image Captioning
	Sahinur Rahman Laskar, Rohit Pratap Singh, Partha Pakray and Sivaji Bandyopadhyay
	- Supervised and Unsupervised Machine Translation for Myanmar-English and Khmer-English
	Benjamin Marie, Hour Kaing, Aye Myat Mon, Chenchen Ding, Atsushi Fujita, Masao Utiyama and Eiichiro Sumita
	- NICT's participation to WAT 2019: Multilingualism and Multi-step Fine-Tuning for Low Resource NMT
	Raj Dabre and Eiichiro Sumita
	- KNU-HYUNDAI’s NMT system for Scientific Paper and Patent Tasks onWAT 2019
	Cheoneum Park, Young-Jun Jung, Kihoon Kim, Geonyeong Kim, Jae-Won Jeon, Seongmin Lee, Junseok Kim and Changki Lee
	- English-Myanmar Supervised and Unsupervised NMT: NICT's Machine Translation Systems at WAT-2019
	Rui Wang, Haipeng Sun, Kehai Chen, Chenchen Ding, Masao Utiyama and Eiichiro Sumita
	- UCSMNLP: Statistical Machine Translation for WAT 2019
	Aye Thida, Nway Nway Han, Sheinn Thawtar Oo and Khin Thet Htar
	- NTT Neural Machine Translation Systems at WAT 2019
	Makoto Morishita, Jun Suzuki and Masaaki Nagata
	- Neural Machine Translation System using a Content-equivalently Translated Parallel Corpus for the Newswire Translation Tasks at WAT 2019
	Hideya Mino, Hitoshi Ito, Isao Goto, Ichiro Yamada, Hideki Tanaka and Takenobu Tokunaga
	- English to Hindi Multi Modal Image Caption Translation: A Shared Task
	Gurpreet Josan, Jagroop Kaur and Jaswinder Singh
	- Facebook AI’s WAT19 Myanmar-English Translation Task Submission
	Peng-Jen Chen, Jiajun Shen, Matthew Le, Vishrav Chaudhary, Ahmed El-Kishky, Guillaume Wenzek, Myle Ott and Marc'Aurelio Ranzato
	- Combining Translation Memory with Neural Machine Translation
	Akiko Eriguchi, Spencer Rarrick and Hitokazu Matsushita
	- CVIT's submissions to WAT-2019
	Jerin Philip, Shashank Siripragada, Upendra Kumar, Vinay Namboodiri and C V Jawahar
	- LTRC-MT Simple & Effective Hindi-English Neural Machine Translation Systems at WAT 2019
	Vikrant Goyal and Dipti Misra Sharma
12:30 - 14:00	Lunch Break
14:00 - 14:45	Invited Talk by Dr. Desmond Elliot
14:45 - 15:05	Poster Booster II
15:05 - 15:10	Commemorative Photo
15:10 - 16:30	System Desccription Paper (Poster) II
	- Long Warm-up and Self-Training: Training Strategies of NICT-2 NMT System at WAT-2019
	Kenji Imamura and Eiichiro Sumita
	- Supervised neural machine translation based on data augmentation and improved training & inference process
	Yixuan Tong, Liang Liang, Boyan Liu, Shanshan Jiang and Bin Dong
	- Sarah's Participation in WAT 2019
	Raymond Hendy Susanto, Ohnmar Htun and Liling Tan
	- Our Neural Machine Translation Systems for WAT 2019
	Wei Yang and Jun Ogata
	- Japanese-Russian TMU Neural Machine Translation System using Multilingual Model for WAT 2019
	Aizhan Imankulova, Masahiro Kaneko and Mamoru Komachi
	- NLPRL at WAT2019: Transformer-based Tamil – English Indic Task Neural Machine Translation System
	Amit Kumar and Anil Kumar Singh
	- Idiap NMT System for WAT 2019 Multimodal Translation Task
	Shantipriya Parida, Ondřej Bojar and Petr Motlicek
	- WAT2019: English-Hindi Translation on Hindi Visual Genome Dataset
	Loitongbam Sanayai Meetei, Thoudam Doren Singh and Sivaji Bandyopadhyay
	- SYSTRAN @ WAT 2019: Russian-Japanese News Commentary task
	Jitao Xu, TuAnh Nguyen, MinhQuang PHAM, Josep Crego and Jean Senellart
	- UCSYNLP-Lab Machine Translation Systems for WAT 2019
	Yimon ShweSin, Win Pa Pa and KhinMar Soe
16:30 - 17:30	Research Paper II
	16:30 - 16:50 : Sentiment Aware Neural Machine Translation
	Chenglei Si, Kui Wu, Ai Ti Aw and Min-Yen Kan
	16:50 - 17:10 : Overcoming the Rare Word Problem for low-resource language pairs in Neural Machine Translation
	Thi-Vinh Ngo, Thanh-Le Ha, Phuong-Thai Nguyen and Le-Minh Nguyen
	17:10 - 17:30 : Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation
	Ali Fadel, Ibraheem Tuffaha, Bara' Al-Jawarneh and Mahmoud Al-Ayyoub
17:30 - 17:35	Closing

POLICY

There is no limitation on the number of teams from one laboratory to register WAT2019.
From one laboratory (not one team), at most two submissions are allowed for human evaluation in each sub-task.
If a team takes part in human evaluation, it is required to submit its system description paper and to attend the workshop.
A team can submit its system description paper even only taking part in automatic evaluation.
In the case that a team taking part in human evaluation but does not submit their system description paper, the evaluation results, including automatic and by human, will not be presented in the overview paper provided by the organizers.

RESEARCH PAPER

WAT 2019 invites researchers to submit their original work on machine translation for Asian languages. The scope covers studies and reports on theories, techniques, and resources to improve the machines translation of Asian languages. All submitted research papers will be examined under a double-blind peer-reviewing to decide if they will appear at the workshop. Topics of interest include, but are not limited to:

Word-/phrase-/syntax-/semantics-/rule-based, neural, and hybrids machine translation
Asian language processing
Incorporating linguistic information into machine translation
Decoding algorithms
System combination
Error analysis
Manual and automatic machine translation evaluation
Machine translation applications
Quality estimation
Domain adaptation
Machine translation for low resource languages
Language resources

The format of research paper is identital to the format of system description, please refer to the section of PAPER SUBMISSION INFORMATION.

TRANSLATION TASK

Tasks:

Scientific paper tasks: Asian Scientific Paper Excerpt Corpus (ASPEC)
- English <--> Japanese
- Chinese <--> Japanese
Patent tasks: Japan Patent Office Patent Corpus 2.0 (JPC2)
- Chinese <--> Japanese
- Korean <--> Japanese
- English <--> Japanese
- Chinese -> Japanese expression pattern task
Timely Disclosure tasks: Timely Disclosure Documents Corpus (NEW!!)
- Japanese --> English
Newswire tasks: JIJI Corpus
- Japanese <--> English
News Commentary task: (NEW!!)
- Japanese <--> Russian
Mixed-domain tasks:
- UCSY and ALT corpora: Myanmar <--> English
- ECCC and ALT corpora: Khmer <--> English (NEW!!)
Indic tasks:

IIT Bombay (IITB) corpus: Hindi <--> English
UFAL (EnTam) corpus: Tamil <--> English (NEW!!)
Multimodal : English --> Hindi (NEW!!)

Dataset:

Scientific paper tasks:
WAT uses ASPEC for the dataset including training, development, development test and test data. Participants of the scientific paper tasks must get a copy of ASPEC by themselves. ASPEC consists of approximately 3 million Japanese-English parallel sentences from paper abstracts (ASPEC-JE) and approximately 0.7 million Japanese-Chinese paper excerpts (ASPEC-JC)
Patent tasks:
WAT uses JPO Patent Corpus, which is constructed by Japan Patent Office (JPO). This corpus consists of 1 million English-Japanese parallel sentences, 1 million Chinese-Japanese parallel sentences, and 1 million Korean-Japanese parallel sentences from patent description with four categories. Participants of patent tasks are required to get it on WAT2019 site of JPO Patent Corpus.
- English/Chinese/Korean <--> Japanese:
  These tasks evaluate performance of a translation model similarly as the other translation tasks. Differing from the previous tasks at WAT2015, WAT2016 and WAT2017, new test sets of these tasks consists of (a) patent documents published between 2011 and 2013, which were used in the past years' WAT, and (b) ones published between 2016 and 2017 for each language pair. We will also evaluate performance of the section (a) so as to compare systems submitted in the past years' WAT.
- Chinese -> Japanese expression pattern task:
  This task evaluates performance of a translation model for each predifined category of expression patterns, which corresponds to title of invention (TIT), abstract (ABS), scope of claim (CLM) or description (DES). Test set of this task consists of sentences each of which is annotated with a corresponding category of expression patterns.
Timely Disclosure tasks:
Newswire tasks:
News Commentary task:
Mixed-domain tasks:
- Myanmar (Burmese) <--> English
  WAT uses UCSY Corpus and ALT Corpus. The UCSY corpus and a portion of the ALT corpus are use as training data, which are around 220,000 lines of sentences and phrases. The development and test data are from the ALT corpus.
- Khmer <--> English
  WAT uses ECCC Corpus and ALT Corpus. The ECCC corpus and a portion of the ALT corpus are use as training data, which are around 120,000 lines of sentences and phrases. The development and test data are from the ALT corpus.
Indic Task :
- Hindi <--> English
  WAT uses IITB Corpus for the dataset for training, development, development test and test data. The training corpus is mixed domain and contains around 1 million lines of sentences and phrases. In order to access the corpus participants should sign the following agreement, scan and send it to the addresss mentioned in it. The training corpus is a mixed domain corpus. The development and test set are from the News domain and are exactly the same as the ones in WMT 2014.
- Tamil <--> English
  WAT will use the EnTam Corpus corpus collected by researchers at UFAL. The training data contains around 160,000 lines of parallel corpora. The data belongs to three domains: Cinema, News and Bible.
- Multimodal Task:
  For the first time WAT will be organizing a multimodal English --> Hindi translation task where the input will be text and an Image and the output will be a caption (text). The training set contains around 30,000 segments (Visual Genome). For additional details related to the task kindly visit the link provided.

Baseline system:

Baseline systems site is now open.

TRANSLATION TASK EVALUATION

We will evaluate the translation performance of the results submitted through automatic evaluation and human evaluation.

Automatic evaluation:
We will prepare an automatic evaluation server. You will be able to evaluate the translation results at any time using this server.

Metric:
BLEU, RIBES, and AM-FM^*
* A two-dimensional automatic evaluation metric for machine translation, designed to operate at the sentence level.
* It is based on adequacy and fluency, to decouple semantic and syntactic components of the translation process to provide a balanced view on translation quality.
Format:
The submission format and the submission method are given at the submission site below.
Notice:
When submitting, participants have to agree that the submitted results are attributed to JST and NICT. The results will be used and distributed for research by JST and NICT.
Thanks to the technical collaborators: Luis Fernando D'Haro, Rafael E. Banchs and Haizhou Li.

Human evaluation:
Human evaluation will be carried out with two kinds of method, which are Pairwise Crowdsourcing Evaluation and JPO Adequacy Evaluation.
The detailed regulations of the human evaluation will be announced soon.

Submission:
Submission site is now open.
(User Name and Password is necessary to access.)

Evaluation results:
Evaluation results site is now open.

PAPER SUBMISSION INFORMATION

Participants who submit results for human evaluation are required to submit description papers of their translation systems and evaluation results. All submissions and feedback are handled electronically as below.

Format and Template:
Participants must use the same format as EMNLP-IJCNLP2019 and follow the same instructions in terms of the format.
We ask you to use the EMNLP-IJCNLP2019 LaTeX style files or Microsoft Word template that are available on the EMNLP-IJCNLP2019 conference web site.
Submission site:
https://www.softconf.com/emnlp2019/ws-WAT2019/

REGISTRATION

The application site for task participants of WAT2019 is now open.

ORGANIZERS

Toshiaki Nakazawa, The University of Tokyo, Japan
Chenchen Ding, National Institute of Information and Communications Technology (NICT), Japan
Raj Dabre, National Institute of Information and Communications Technology (NICT), Japan
Anoop Kunchukuttan, Microsoft AI and Research, India
Win Pa Pa, University of Computer Studies, Yangon (UCSY), Myanmar
Nobushige Doi, Japan Exchange Group (JPX), Japan
Yusuke Oda, Google, Japan
Ondřej Bojar, Charles University, Prague, Czech Republic
Shantipriya Parida, Idiap Research Institute, Martigny, Switzerland
Isao Goto, Japan Broadcasting Corporation (NHK), Japan
Hidaya Mino, Japan Broadcasting Corporation (NHK), Japan
Hiroshi Manabe, National Institute of Information and Communications Technology (NICT), Japan
Katsuhito Sudoh, Nara Institute of Science and Technology (NAIST), Japan
Sadao Kurohashi, Kyoto University, Japan
Pushpak Bhattacharyya, Indian Institute of Technology Patna (IITP), India

TECHNICAL COLLABORATOR

Luis Fernando D'Haro, Universidad Politécnica de Madrid, Spain
Rafael E. Banchs, Nanyang Technological University, Singapore
Haizhou Li, National University of Singapore, Singapore

CONTACT

For questions, comments, etc. please email to "wat-organizer -at- googlegroups -dot- com".

Japan Patent Office
The Association for Natural Language Processing (Japanese Page)
National Institute of Information and Communications Technology (NICT)
Japanese Exchange Group (JPX)
Asia-Pacific Association for Machine Translation (AAMT)

PREVIOUS WORKSHOPS

WAT 2018
The 5th Workshop on Asian Translation (WAT 2018) was held in December 2018 in Hong Kong, China.
WAT 2017
The 4th Workshop on Asian Translation (WAT 2017) was held in November 2017 in Taipei, China.
WAT 2016
The 3rd Workshop on Asian Translation (WAT 2016) was held in December 2016 in Osaka, Japan.
WAT 2015
The 2nd Workshop on Asian Translation (WAT 2015) was held in October 2015 in Kyoto, Japan.
WAT 2014
The 1st Workshop on Asian Translation (WAT 2014) was held in October 2014 in Tokyo, Japan.

CHANGE LOG

2019-07-23: submisson, evaluation, baseline opened
2019-06-10: policy added
2019-04-26: task updated
2019-04-12: updated
2019-02-01: site opened

NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2019-06-10

WAT 2019 The 6th Workshop on Asian Translation

November 4, 2019 Hong Kong, China