NICT_LOGO.JPG KYOTO-U_LOGO.JPG

MultiIndicMT: An Indic Language Multilingual Task

[HOME]

INTRODUCTION

Given the growing sizes of monolingual, parallel training data for Indic languages, we extend the WAT 2020 Indic languages task with additional languages and n-way evaluation corpora.

TASK DESCRIPTION

The task covers 10 Indic Languages (Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil and Telugu) and English. We will evaluate the submissions on 20 translation directions (English-Indic and Indic-English). We are also exploring the possibility of evaluation between some Indian language pairs as well. We will keep you updated on that. Individually, Indic languages are resource poor which hampers translation quality but by leveraging multilingualism and abundant monolingual corpora, the translation quality can be substantially boosted. The purpose of this task is to validate the utility of MT techniques that focus on multilingualism and/or monolingual data.

Corpora

Submission Details

Back to top

CONTACT

For general questions, comments, etc. please email to "wat-organizer -at- googlegroups -dot- com". For questions related to this task contact "prajdabre -at- gmail -dot- com" or "anoop.kunchukuttan -at- gmail -dot- com".

Back to top

NICT (National Institute of Information and Communications Technology)
Last Modified: 2021-01-04