NICT_LOGO.JPG

Document-level Newswire Translation sub-task

[HOME]

INTRODUCTION

The document-level newswire translation subtask uses JIJI Corpus that was constructed by Jiji Press Ltd in collaboration with the National Institute of Information and Communications Technology (NICT) and NHK. This corpus consists of a Japanese-English news corpus of 200K parallel sentences. These data come from Jiji Press news with various categories including politics, economy, nation, business, markets, sports and so on. The original news were distributed to many of newspaper companies, TV stations or portal sites. Jiji Press aims to introduce machine translation technologies into the daily editorial work in the future.

TASK DESCRIPTION

Task description is here.

DETAIL

JIJI Corpus includes:

The numbers of sentences are as follows:

Data Type File Name Number of sentences
TRAIN train.txt 200,000
DEV dev.txt 2,000
DEVTEST devtest.txt 2,000
TEST test.txt 2,000
Data Type File Name Quantity
DEV devc.tsv 479 sentence pairs
context-devc.en.tsv 132 articles
context-devc.ja.tsv 132 articles
TEST testc.tsv 1,851 sentence pairs
context-testc.en.tsv 546 articles
context-testc.ja.tsv 546 articles

HOW TO OBTAIN

Back to top

AGREEMENT

Back to top

Instructions for the use of JIJI Corpus

Expressions including personal information cannot be used as examples in papers or presentations.

Personal information must be anonymized when expressions including personal information are used as examples in papers or presentations.

Back to top

CONTACT

For questions, comments, etc. please email to "wat-organizer -at- googlegroups -dot- com".

Back to top

CHANGE LOG

2022-7-26: the address to which the agreement form is sent was updated
2022-4-18: agreement forms were updated for WAT2022
2021-1-24: agreement forms were updated for WAT2021
2019-6-12: corpus and agreement forms were updated for WAT2020
2019-4-22: agreement forms were updated for WAT2019
2018-8-16: agreement forms were updated for WAT2018
2017-6-12: site open


NICT (National Institute of Information and Communications Technology)
NHK (Japan Broadcasting Corporation)
Last Modified: 2022-04-18