NICT_LOGO.JPG KYOTO-U_LOGO.JPG

WAT 2020

Khmer-English Parallel Data

[HOME]

The registration of the use of ECCC data is opened (2020/07/17)

INTRODUCTION

The parallel data for Khmer-English tanslation tasks at WAT2020 consist of two corpora, the ALT corpus and ECCC corpus.

DETAIL

The numbers of sentences are as follows:

Data Type File Name Number of Sentences
TRAIN train.eccc.[km|en] 104,660
train.alt.[km|en] 18,088
DEV dev.alt.[km|en] 1,000
TEST test.alt.[km|en] 1,018

HOW TO OBTAIN

Khmer-English Parallel Data for WAT2020

Back to top

CONTACT

For questions, comments, etc. please email to "wat-organizer -at- googlegroups -dot- com".

Back to top

CHANGE LOG

2020-07-17: site open


NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2020-07-17