WAT 2018
The 5th Workshop on Asian Translation
December 3, 2018
Hong Kong, China
[SPONSOR]
| [TIMETABLE]
| [PANEL DISCUSSION]
| [IMPORTANT DATES]
| [TRANSLATION TASK]
| [EVALUATION]
| [BASELINE SYSTEM]
| [PAPER SUBMISSION INFORMATION]
| [REGISTRATION]
| [PROCEEDINGS]
| [CONTACT]
| [PREVIOUS WORKSHOPS]
PHOTOS are now open.
The proceedings are now available here
The translation task submission dealine is extended to 2018/09/15
The system discription submission dealine is extended to 2018/11/09
The camera-ready submission dealine is extended to 2018/11/29
INTRODUCTION
Many Asian countries are rapidly growing these days and the importance of communicating and exchanging the information with these countries has intensified.
To satisfy the demand for communication among these countries, machine translation technology is essential.
Machine translation technology has rapidly evolved recently and it is seeing practical use especially between European languages.
However, the translation quality of Asian languages is not that high compared to that of European languages,
and machine translation technology for these languages has not reached a stage of proliferation yet.
This is not only due to the lack of the language resources for Asian languages
but also due to the lack of techniques to correctly transfer the meaning of sentences from/to Asian languages.
Consequently, a place for gathering and sharing the resources and knowledge about Asian language translation is necessary to enhance machine translation research for Asian languages.
The Workshop on Machine Translation (WMT), the world's largest machine translation workshop, mainly targets on European languages and does not include Asian languages.
The International Workshop on Spoken Language Translation (IWSLT) has spoken language translation tasks for some Asian languages using TED talk data,
but these is no task for written language.
The Workshop on Asian Translation (WAT) is an open machine translation evaluation campaign focusing on Asian languages.
WAT gathers and shares the resources and knowledge of Asian language translation to understand the problems to be solved
for the practical use of machine translation technologies among all Asian countries.
WAT is unique in that it is an "open innovation platform":
the test data is fixed and open, so participants can repeat evaluations on the same data and confirm changes in translation accuracy over time.
WAT has no deadline for the automatic translation quality evaluation (continuous evaluation), so participants can submit translation results at any time.
Following the success of the previous WAT workshops (WAT2014 -- WAT2017),
WAT2018 will bring together machine translation researchers and users to try, evaluate, share and discuss brand-new ideas about machine translation.
What's NEW in WAT2018 :
- additional test data for patent tasks
- Myanmar-English mixed domain tasks
- Indic languages multilingual tasks
- Multilingual NMT subtasks for Hindi-English and Hindi-Japanese mixed domain tasks
WAT2018 does NOT call research papers, but you can submit them to PACLIC32.
|
SunFlare Co., Ltd. |
|
Kawamura International |
|
AAMT |
10:25 - 10:45 | Welcome & Overview |
| Overview of the 5th Workshop on Asian Translation |
| Toshiaki Nakazawa, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Win Pa Pa, Isao Goto, Hideya Mino, Katsuhito Sudoh, and Sadao Kurohashi |
10:45 - 11:10 | Poster Booster I |
11:10 - 12:25 | Poster Presentation I |
| †TMU Japanese-Chinese Unsupervised NMT System for WAT 2018 Translation Task ‡TMU Japanese-English Neural Machine Translation System using Generative Adversarial Network for WAT 2018 |
| †Longtu Zhang, Yuting Zhao, and Mamoru Komachi ‡Yukio Matsumura, Satoru Katsumata, and Mamoru Komachi |
| SMT reranked NMT (2) |
| Terumasa Ehara |
| Osaka University MT Systems for WAT 2018: Rewarding, Preordering, and Domain Adaptation |
| Yuki Kawara, Yuto Takebayashi, Chenhui Chu, and Yuki Arase |
| Combination of Statistical and Neural Machine Translation for Myanmar-English |
| Benjamin Marie, Atsushi Fujita, and Eiichiro Sumita |
| UCSYNLP-Lab Machine Translation Systems for WAT 2018 |
| Hsu Myat Mo, Yi Mon Shwe Sin, Thazin Myint Oo, Win Pa Pa, Khin Mar Soe, and Ye Kyaw Thu |
12:25 - 14:00 | Lunch |
14:00 - 15:25 | Panel Discussion |
15:25 - 15:30 | Commemorative Photo |
15:30 - 15:50 | Poster Booster II |
15:50 - 17:05 | Poster Presentation II |
| English-Myanmar NMT and SMT with Pre-ordering: NICT's machine translation systems at WAT-2018 |
| Rui Wang, Chenchen Ding, Masao Utiyama, and Eiichiro Sumita |
| NICT's Participation in WAT 2018: Approaches Using Multilingualism and Recurrently |
| Raj Dabre, Anoop Kunchukuttan, Atsushi Fujita, and Eiichiro Sumita |
| XMU Neural Machine Translation Systems for WAT2018 Myanmar-English Translation Task |
| Boli Wang, Jinming Hu, Yidong Chen, and Xiaodong Shi |
| SRCB Neural Machine Translation Systems in WAT 2018 |
| Yihan Li, Boyan Liu, Yixuan Tong, Shanshan Jiang and Bin Dong |
| The RGNLP Machine Translation Systems for WAT 2018 |
| Atul Kr. Ojha, Koel Dutta Chowdhury, Chao-Hong Liu, and Karan Saxena |
| Statistical Machine Translation Using 5-grams Word Segmentation in Decoding |
| Aye Thida, Nway Nway Han, and Sheinn Thawtar Oo |
17:05 - | Closing |
PANEL DISCUSSION
IMPORTANT DATES
Translation Task Submission Deadline | September 15, 2018 |
|
System Description Submission Deadline | November 9, 2018 |
|
Review Feedback of System Description | November 16, 2018 |
|
Camera-ready Deadline | November 29, 2018 |
|
* All deadlines are calculated at 11:59PM UTC-7
TRANSLATION TASK
Tasks:
- Scientific paper tasks: Asian Scientific Paper Excerpt Corpus (ASPEC)
- English <--> Japanese
- Chinese <--> Japanese
- Patent tasks: Japan Patent Office Patent Corpus 2.0 (JPC2)
- Chinese <--> Japanese
- Korean <--> Japanese
- English <--> Japanese
- Chinese -> Japanese expression pattern task [NEW!]
- Newswire tasks: JIJI Corpus
- Mixed domain tasks:
- Hindi <--> English: IIT Bombay (IITB) Corpus
- Hindi <--> Japanese
- Myanmar (Burmese) <--> English: UCSY and ALT Corpora [NEW!]
- Recipe tasks: Cookpad Comparable Corpus
- Indic languages multilingual tasks: [NEW!]
- Bengali/Hindi/Malayalam/Tamil/Telugu/Urdu/Sinhalese <--> English
Dataset:
- Scientific paper tasks:
WAT uses ASPEC
for the dataset including training, development, development test
and test data.
Participants of the scientific paper tasks must get a copy of ASPEC
by themselves.
ASPEC consists of approximately 3 million Japanese-English parallel sentences
from paper abstracts (ASPEC-JE) and approximately 0.7 million
Japanese-Chinese paper excerpts (ASPEC-JC)
- Patent tasks:
WAT uses JPO Patent Corpus,
which is constructed by Japan Patent Office (JPO).
This corpus consists of 1 million English-Japanese parallel sentences, 1 million Chinese-Japanese parallel sentences,
and 1 million Korean-Japanese parallel sentences from patent description with four categories.
Participants of patent tasks are required to get it on WAT2018 site of JPO Patent Corpus.
- English/Chinese/Korean <--> Japanese:
These tasks evaluate performance of a translation model similarly as the other translation tasks.
Differing from the previous tasks at WAT2015, WAT2016 and WAT2017,
new test sets of these tasks consists of (a) patent documents published between 2011 and 2013,
which were used in the past years' WAT, and (b) ones published between 2016 and 2017 for each language pair.
We will also evaluate performance of the section (a) so as to compare systems submitted in the past years' WAT.
- Chinese -> Japanese expression pattern task:
This task evaluates performance of a translation model for each predifined category of expression patterns,
which corresponds to title of invention (TIT), abstract (ABS), scope of claim (CLM) or description (DES).
Test set of this task consists of sentences each of which is annotated with
a corresponding category of expression patterns.
- Newswire tasks:
WAT uses JIJI Corpus,
which is constructed by Jiji Press Ltd. in collaboration with the National Institute of Information and Communications Technology (NICT).
This corpus consists of a Japanese-English news corpus of 200K parallel sentences,
from Jiji Press news with various categories.
Participants of newswire tasks are required to get it on WAT2018 site
of JIJI Corpus.
- Mixed domain tasks:
- Hindi <--> English
WAT
uses IITB
Corpus for the dataset for training,
development, development test and test data. The
training corpus is mixed domain and contains around 1
million lines of sentences and phrases. In order to
access the corpus participants should sign the
following agreement,
scan and send it to the addresss mentioned in it. The
training corpus is a mixed domain corpus. The
development and test set are from the News domain and
are exactly the same as the ones in WMT 2014.
-
Vanilla subtask:
Develop Hindi-English and English-Hindi MT system using only the provided IITB English-Hindi Parallel and Monolingual corpora.
-
Multilingual NMT subtask:
Multilingual NMT using additional XX-En corpus to improve Hi-En translation task. Multilingual NMT can be done using Transfer Learning (Zoph et al. 2016) or using Joint Learning (Johnson et al. 2016).
The choice of the additional corpus is up to the participant. One possible choice is Arabic-English UN corpus of approximately 11 million lines.
- Hindi <--> Japanese Pivot Language Task
For the
first time we are introducing a pivot language task. For
this task participants can use the following corpora.
- A parallel corpus (created using openly available corpora) which is located here.
- The Hindi-English (IITB) task corpus and the English-Japanese (ASPEC) task corpus for pivoting. For triangulation of the source-pivot and pivot-target phrase tables they can use the scripts provided by: MultiMT.
The objective of this task is to compare the performance of a baseline system constructed only on a mixed domain parallel corpus with a system that uses additional mixed domain corpus by means of pivoting.
Similarly to the Hindi-English task, there are two subtasks:
- Vanilla subtask (using only the provided parallel corpora)
- Multilingual NMT subtask (learning a joint (Hindi+English)-Japanese model using the Hindi-Japanese and English-Japanese corpora)
- Myanmar (Burmese) <--> English
WAT
uses UCSY
Corpus and ALT Corpus.
The UCSY corpus and a portion of the ALT corpus are use as training data,
which are around 220,000 lines of sentences and phrases.
The development and test data are from the ALT corpus.
- Recipe tasks:
WAT uses Recipe Corpus,
which is constructed by Cookpad Inc.
This corpus consists of 16,282 Japanese-English parallel sentences
from recipes.
Participants of recipe tasks are required to get it on
WAT2018 site of Recipe Corpus.
-
Indic languages Multilingual tasks:
These tasks cover 7 Indic Languages (Bengali, Hindi, Malayalam, Tamil, Telugu, Sinhalese and Urdu) and English. There are a total of 7 language directions. The spoken language domain will be the focus and the corpus used for these tasks (Indic Languages Multilingual Parallel Corpus) comes from the OpenSubtitles datasets from OPUS.
- Pilot subtask
- Transfer Learning subtask
- Language Relatedness study subtask
TRANSLATION TASK EVALUATION
We will evaluate the translation performance of the results submitted through
automatic evaluation and human evaluation.
Automatic evaluation:
We will prepare an automatic evaluation server.
You will be able to evaluate the translation results at any time using this server.
- Metric:
BLEU, RIBES, and AM-FM*
* A two-dimensional automatic evaluation metric for machine translation, designed to operate at the sentence level.
* It is based on adequacy and fluency,
to decouple semantic and syntactic components of the translation process
to provide a balanced view on translation quality.
- Format:
The submission format and the submission method are given at the submission site below.
- Notice:
When submitting, participants have to agree that the submitted results
are attributed to JST and NICT.
The results will be used and distributed for research by JST and NICT.
Thanks to the technical collaborators: Luis Fernando D'Haro, Rafael E. Banchs and Haizhou Li.
Human evaluation:
Human evaluation will be carried out with two kinds of method,
which are Pairwise Crowdsourcing Evaluation and JPO Adequacy Evaluation.
- Pairwise Crowdsourcing Evaluation:
Pairwise Crowdsourcing Evaluation will be carried out using crowdsourcing.
Organizers will sample 400 sentences from the test data for the pairwise
crowdsourcing evaluation.
Participants can submit translation results for the human evaluation
a maximum of twice until the pairwise crowdsourcing evaluation is due.
(For automatic evaluation, there is no limitation on submitting times.)
- Metric:
Sentence-by-sentence pair-wise evaluation compared to the baseline system.
The crowdsourcing workers will be asked to judge which translation is better
than the other in view of adequacy and fluency.
To guarantee the quality of the evaluation, each sentence is evaluated by 5
different workers and the final decision is made by the voting of the judgements.
- Format:
The submission format is the same as that of automatic evaluation.
Participants can select their translation results from the ones submitted
for automatic evaluation.
- Ranking:
All systems will be ranked by the percentage of translations judged
to improve upon the baseline system.
- JPO Adequacy Evaluation:
We will also evaluate with the criteria of Content Transmission Level Evaluation which JPO defined
(pages 5 to 8 in the PDF file (Japanese Page)).
We will sample 200 sentences from the pairwise crowdsourcing evaluation data
for the JPO adequacy evaluation.
The JPO adequacy evaluation will be conducted only for translation results of 3 top-scored teams
on the Pairwise Crowdsourcing Evaluation in each subtask's language pair.
Submission:
Submission site is now open.
(User Name and Password is necessary to access.)
Evaluation results:
Evaluation results site is now open.
Participants who submit results for human evaluation are required to submit description
papers of their translation systems and evaluation results.
All submissions and feedback are handled electronically as below.
REGISTRATION
The application site
for task participants of WAT2018 is now open.
ORGANIZERS
- Toshiaki Nakazawa, The University of Tokyo, Japan
- Chenchen Ding, National Institute of Information and Communications Technology (NICT), Japan
- Shohei Higashiyama, National Institute of Information and Communications Technology (NICT), Japan
- Raj Dabre, National Institute of Information and Communications Technology (NICT), Japan
- Anoop Kunchukuttan, Microsoft AI and Research, India
- Win Pa Pa, University of Computer Studies, Yangon (UCSY), Myanmar
- Isao Goto, Japan Broadcasting Corporation (NHK), Japan
- Hidaya Mino, Japan Broadcasting Corporation (NHK), Japan
- Katsuhito Sudoh, Nara Institute of Science and Technology (NAIST), Japan
- Graham Neubig, Carnegie Mellon University (CMU), USA
- Hideto Kazawa, Google, Japan
- Yusuke Oda, Google, Japan
- Jun Harashima, Cookpad Inc., Japan
- Sadao Kurohashi, Kyoto University, Japan
- Ir. Hammam Riza, Agency for the Assessment and Application of Technology (BPPT), Indonesia
- Pushpak Bhattacharyya, Indian Institute of Technology Patna (IITP), India
TECHNICAL COLLABORATOR
- Luis Fernando D'Haro, Institute for Infocomm Research, Singapore
- Rafael E. Banchs, Institute for Infocomm Research, Singapore
- Haizhou Li, Institute for Infocomm Research, Singapore
For questions, comments, etc. please email to "wat -at- nlp -dot- ist -dot- i -dot- kyoto -hyphen- u -dot- ac -dot- jp".
Japan Patent Office
The Association for Natural Language Processing (Japanese Page)
National Institute of Information and Communications Technology (NICT)
PREVIOUS WORKSHOPS
- WAT 2017
The 4th Workshop on Asian Translation (WAT 2017) was held in November 2017 in Taipei.
- WAT 2016
The 3rd Workshop on Asian Translation (WAT 2016) was held in December 2016 in Osaka, Japan.
- WAT 2015
The 2nd Workshop on Asian Translation (WAT 2015) was held in October 2015 in Kyoto, Japan.
- WAT 2014
The 1st Workshop on Asian Translation (WAT 2014) was held in October 2014 in Tokyo, Japan.
CHANGE LOG
2018-12-01: sponser updated
2018-11-27: timetable and penal discussion are updated
2018-11-25: camera-ready deadline was extended
2018-10-09: submission cite updated
2018-08-24: important dates updated, sponser updated
2018-08-22: submission site and evaluation results site opened
2018-07-19: mixed domain tasks were updated
2018-07-18: Indic languages multilingual tasks were updated
2018-07-14: application site opened
2018-07-02: updated
2018-06-30: site opened
NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2018-08-24