NICT_LOGO.JPG KYOTO-U_LOGO.JPG

MultiIndicMT: An Indic Language Multilingual Task

[HOME]

INTRODUCTION

Given the growing sizes of monolingual, parallel training data for Indic languages, we extend the WAT 2021 Indic languages task with additional languages and n-way evaluation corpora.

TASK DESCRIPTION

The task covers 15 Indic Languages (Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Nepali, Oriya, Punjabi, Sindhi, Sinhala, Tamil, Telugu and Urdu) and English. We will evaluate the submissions on 30 translation directions (English-Indic and Indic-English). We will also evaluate the performace of the Bengali-Hindi, Tamil-Telugu, Hindi-Malayalam and Sindhi-Punjabi Indic language pairs as well. Individually, Indic languages are resource poor, relative to European languages, which hampers translation quality but by leveraging multilingualism and abundant monolingual corpora, the translation quality can be substantially boosted. The purpose of this task is to validate the utility of MT techniques that focus on multilingualism and/or monolingual data.

Corpora

Submission Details

Back to top

CONTACT

For general questions, comments, etc. please email to "wat-organizer -at- googlegroups -dot- com". For questions related to this task contact "prajdabre -at- gmail -dot- com" or "anoop.kunchukuttan -at- gmail -dot- com".

Back to top

NICT (National Institute of Information and Communications Technology)
Last Modified: 2022-03-24