JST_LOGO.JPG NICT_LOGO.JPG KYOTO-U_LOGO.JPG

WAT 2016

BPPT Corpus

[HOME]
The registration of the use of BPPT Corpus for WAT 2016 is opened (2016/6/13)

INTRODUCTION

BPPT Corpus was constructed by Badan Pengkajian dan Penerapan Teknologi (BPPT). This corpus consists of a Indonesian-English news corpus of 50K parallel sentences with five sections, which are Finance, International, Science and Technology, National, and Sports. These data come from Antara News Agency.

DETAIL

BPPT Corpus includes:

The numbers of sentences are as follows:

Data Type File Name Section Number of sentences Total number of sentences
TRAIN train.txt Finance 6,514 50,000
International 9,704
Science and Technology 10,129
National 9,847
Sports 13,806
DEV dev.txt Finance 52 400
International 78
Science and Technology 81
National 79
Sports 110
DEVTEST devtest.txt Finance 52 400
International 78
Science and Technology 81
National 79
Sports 110
TEST test.txt Finance 52 400
International 78
Science and Technology 81
National 79
Sports 110

HOW TO OBTAIN

WAT2016 has already been held. Please wait for next WAT announcement.
Back to top

AGREEMENT

Back to top

CONTACT

For questions, comments, etc. please email to "wat -at- nlp.ist.i.kyoto-u.ac.jp".

Back to top

CHANGE LOG

2016-5-11: site open


JST (Japan Science and Technology Agency)
NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2016-06-13