NICT_LOGO.JPG KYOTO-U_LOGO.JPG

JIJI Corpus

[HOME]

INTRODUCTION

JIJI Corpus was constructed by Jiji Press Ltd in collaboration with the National Institute of Information and Communications Technology (NICT). This corpus consists of a Japanese-English news corpus of 200K parallel sentences. These data come from Jiji Press news with various categories including politics, economy, nation, business, markets, sports and so on. The original news were distributed to many of newspaper companies, TV stations or portal sites. Jiji Press aims to introduce machine translation technologies into the daily editorial work in the future.

DETAIL

JIJI Corpus includes:

The numbers of sentences are as follows:

Data Type File Name Number of sentences
TRAIN train.txt 200,000
DEV dev.txt 2,000
DEVTEST devtest.txt 2,000
TEST test.txt 2,000

HOW TO OBTAIN

Back to top

AGREEMENT

Back to top

CONTACT

For questions, comments, etc. please email to "wat-organizer -at- googlegroups -dot- com".

Back to top

CHANGE LOG

2018-8-16: agreement forms were updated for WAT2018
2017-6-12: site open


NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2019-04-22