NICT_LOGO.JPG KYOTO-U_LOGO.JPG

MultiIndicMT: An Indic Language Multilingual Task

[HOME]

INTRODUCTION

This year we expand the WAT 2022 Indic languages task going from 15 to 19 Indic languages.

TASK DESCRIPTION

The task covers 19 Indic Languages (Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Nepali, Oriya, Punjabi, Sindhi [Arabic script], Sinhala, Tamil, Telugu, Santali, Kashmiri [Arabic as well as Devanagari script], Maithili, Sanskrit and Urdu) and English. We will evaluate the submissions on 38 translation directions (English-Indic and Indic-English). We will also evaluate the performace of the Bengali-Hindi, Tamil-Telugu, Hindi-Malayalam and Sindhi-Punjabi Indic language pairs as well. Individually, Indic languages are resource poor, relative to European languages, which hampers translation quality but by leveraging multilingualism and abundant monolingual corpora, the translation quality can be substantially boosted. The purpose of this task is to validate the utility of MT techniques that focus on multilingualism and/or monolingual data.

Corpora

Submission Details

Back to top

CONTACT

For general questions, comments, etc. please email to "wat-organizer -at- googlegroups -dot- com". For questions related to this task contact "prajdabre -at- gmail -dot- com" or "anoop.kunchukuttan -at- gmail -dot- com".

Back to top

NICT (National Institute of Information and Communications Technology)
Last Modified: 2023-05-02