In collaboration with SAP and NICT, WAT will evaluate Hindi/Thai/Malay/Indonesian <--> English translation for two domains: IT domain (Software Documentation) and Wikinews domain (ALT). The purpose is to determine the feasibility of multilingualism, domain adaptation or document level knowledge given very little to none clean parallel corpora for training.
IT domain and Wikinews are two extremely low-resource domains for Machine Translation, especially when concerning languages such as Hindi, Thai, Malay and Indonesian. Either, there are clean but extremely small parallel corpora (approx. 18000 lines) for Wikinews or no clean corpora for the IT domain. In low-resource settings, it is often helpful to leverage monolingual or bilingual corpora from multiple languages and domains to boost translation quality. Additionally, given that the evaluation sets for both tasks contain document level splits (as meta data), it should be possible to leverage extended context to improve translation quality. Thus, the purpose of this task is to determine the limits to which translation quality can be pushed in such a setting via a combination of multilingualism, domain adaptation or document level knowledge. The specific details of this task are:
For general questions, comments, etc. please email to "wat-organizer -at- googlegroups -dot- com". For questions related to this task contact "prajdabre -at- gmail -dot- com".
NICT (National Institute of Information and Communications Technology)
SAP
Last Modified: 2021-01-04