WAT 2014

The 1st Workshop on Asian Translation

9:30am-12:00pm, October 4, 2014
Lecture Room 241, 4th floor, Faculty of Engineering Bldg.2,
Hongo Campus, The University of Tokyo, Japan [MAP]

INTRODUCTION

The Workshop on Asian Translation (WAT) is a new open evaluation campaign focusing on Asian languages. We would like to invite a broad range of participants and conduct various forms of machine translation experiments and evaluation. Collecting and sharing our knowledge will allow us to understand the essence of machine translation and the problems to be solved. We are working toward the practical use of machine translation among all Asian countries.

For the 1st WAT, we chose scientific papers as the targeted domain, and selected the languages Japanese, Chinese and English.

What makes WAT unique:

Open innovation platform
The test data is fixed and open, so you can repeat evaluations on the same data and confirm changes in translation accuracy over time. WAT has no deadline for the automatic translation quality evaluation (continuous evaluation), so you can submit translation results at any time.
Domain and language pairs
WAT is the world's first workshop that uses scientific papers as a domain and Japanese-Chinese as a language pair. In the future, we will add more Asian languages, such as Korean, Vietnamese, Indonesian, Thai, Myanmar and so on.
Context-aware evaluation
The test data of WAT is prepared using the paragraph as a unit, while almost all other evaluation campaigns use the sentence as a unit. We would like to consider how to realize context-aware evaluation in WAT.
Evaluation method
Evaluation will be done by both automatic and human evaluation. For human evaluation, WAT will use crowdsourcing, which is low cost and allows multiple evaluations.

TIMETABLE

9:30	Welcome
9:30 - 10:10	Invited talk: Hamman Riza
10:10 - 10:25	Overview of WAT2014: Toshiaki Nakazawa, Hideya Mino, Isao Goto, Sadao Kurohashi, Eiichiro Sumita
10:25 - 10:40	Weblio Pre-reordering Statistical Machine Translation System: Zhongyuan Zhu
10:40 - 10:55	Forest-to-String SMT for Asian Language Translation: NAIST at WAT 2014: Graham Neubig
10:55 - 11:00	Break
11:00 - 11:10	Poster booster session (10 systems)
11:10 - 11:50	Poster presentation (all 12 systems)
11:50 - 12:00	Closing
12:00 - 12:05	Commemorative photo

INVITED TALK

Dr. Ir. Hammam Riza, Director, Agency for the Assessment and Application of Technology (BPPT)
[Short bio.]

Title: Leveraging ASEAN economic communities 2015 through Language Translation
[slide]

(Click here to more information. )

Time: 9:30am-10:10am

IMPORTANT DATES

Crowdsourcing evaluation due	August 31, 2014	~~July 31, 2014~~
System description draft paper due	September 14, 2014	~~August 31, 2014~~
Review feedback	September 21, 2014	~~September 7, 2014~~
System description camera-ready paper due	September 28, 2014	~~September 14, 2014~~
WAT 2014	9:30am-12:00pm, October 4, 2014

TASK

The task is to improve the text translation quality for scientific papers. Participants choose any of the subtasks in which they would like to participate and translate the test data using their machine translation systems. The WAT organizers will evaluate the results submitted using automatic evaluation and human evaluation by crowdsourcing. We also provide a baseline machine translation system using Moses.

Subtasks:

English -> Japanese
Japanese -> English
Chinese -> Japanese
Japanese -> Chinese

Dataset:

WAT uses ASPEC for the dataset including training, development, development test and test data. Participants must get a copy of ASPEC by themselves. ASPEC consists of approximately 3 million Japanese-English parallel sentences from paper abstracts (ASPEC-JE) and approximately 0.7 million Japanese-Chinese paper excerpts (ASPEC-JC).

Baseline system:

Baseline systems site is now open.

EVALUATION

We will evaluate the translation performance of the results submitted through automatic evaluation and human evaluation.

Automatic evaluation:
We will prepare an automatic evaluation server. You will be able to evaluate the translation results at any time using this server.

Metric:
BLEU and RIBES
Format:
The submission format and the submission method are given at the submission site below.
Notice:
When submitting, participants have to agree that the submitted results are attributed to JST and NICT. The results will be used and distributed for research by JST and NICT.

Human evaluation:
Human evaluation will be carried out using crowdsourcing. Organizers will sample 400 sentences from the test data for human evaluation. Participants can submit translation results for the human evaluation a maximum of twice until the crowdsourcing evaluation is due.

Metric:
Sentence-by-sentence pair-wise evaluation compared to the baseline system. The crowdsourcing workers will be asked to judge which translation is better than the other in view of adequacy and fluency.
Format:
The submission format is the same as that of automatic evaluation. Participants can select their translation results from the ones submitted for automatic evaluation.
Ranking:
All systems will be ranked by the percentage of translations judged to improve upon the baseline system.

Submission:

Submission site is now open.
(User Name and Password is necessary to access.)

Evaluation results:

Evaluation results site is now open.

PAPER SUBMISSION INFORMATION

Participants who submit results for human evaluation should submit description papers of their translation systems and evaluation results. All submissions and feedback are handled electronically as below.

Format:
PDF file format, two (2) columns, no smaller than nine (9) point type font throughout the paper.
Language:
Papers should be written in English.
Encouragement:
We strongly prefer that papers include a section entitled "Issues for Context-aware Machine Translation" which discusses the importance and usefulness of context.
Maximum length:
The maximum length of a submitted paper is eight (8) pages.
Template:
Participants must use the ACL 2014 LaTeX style files or Microsoft Word style files that are available on the ACL 2014 conference web site.
Submission site:
The submission site is open to only task participants.

REGISTRATION

The registration for task participants:
The registration site for task participants of WAT2014 is now open.
The registration fee is FREE for all participants.
The registration for WAT2014 audiences:
There is no need to register in advance.
The registration fee is FREE for all audiences include participants.

ORGANIZERS

Toshiaki Nakazawa (Japan Science and Technology Agency (JST), Japan)
Hideya Mino (National Institute of Information and Communications Technology (NICT), Japan)
Isao Goto (Japan Broadcasting Corporation (NHK), Japan)
Eiichiro Sumita (National Institute of Information and Communications Technology (NICT), Japan)
Sadao Kurohashi (Kyoto University, Japan)

CONTACT

For questions, comments, etc. please email to "wat-at-nlp.ist.i.kyoto-u.ac.jp".

The Association for Natural Language Processing (Japanese Page)
Asian Scientific Paper Excerpt Corpus (ASPEC)
Japan Science and Technology Agency (JST)
National Institute of Information and Communications Technology (NICT)

CHANGE LOG

2014-10-12: INVITED TALK slide
2014-10-12: PAPERS slides
2014-10-06: PHOTOS
2014-10-01: PAPERS
2014-09-25: TIMETABLE
2014-07-17: IMPORTANT DATES (Extended)
2014-07-14: baseline system site, submission site and evaluation results site open
2014-07-04: registration site open
2014-03-05: site open

JST (Japan Science and Technology Agency)
NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2015-02-25

WAT 2014 The 1st Workshop on Asian Translation

9:30am-12:00pm, October 4, 2014 Lecture Room 241, 4th floor, Faculty of Engineering Bldg.2, Hongo Campus, The University of Tokyo, Japan [MAP]