NICT logo Kyoto University logo

Japanese → English:
Article-level News Translation Tasks

[HOME]

Introduction

This task focuses on translating news articles from Japanese into English.
As part of this article-level translation task, we will evaluate three types of English translations derived from the original Japanese article:
Task 1. Literal translation
Task 2. English Article Style Translation
Task 3. Reconstruction of the original English article

The English translations generated from Japanese articles are evaluated using both automatic and human evaluation for each task.

Dataset

We publish human translated an article-level dataset for WAT2025:

  • Train: 227 articles
  • Dev: 50 articles
  • Test: 100 articles (to be distributed in mid-September)
The distribution method for this dataset is described in the "Agreement" section below.

In addition, it provides previously distributed a sentence-level dataset (Jiji 2020):

  • Train: 200,000 sentences
  • Dev: 479 sentences
  • Test: 1,851 sentences
or more information about Jiji 2020, please visit our previous WAT2020 website.

Dataset Details

Our tasks created a dataset from Jiji Press articles related to weather and disasters in 2024.
The distributed dataset is in jsonl format. Each line contains the following keys:

article_id: The article_id consists of a combination of four digits for the year, four digits for the month and day, and four digits for the serial number (e.g., 202504010001 for an article published on April 1, 2025, with serial number 0001).
ja_original: Original Japanese article. (Source)
en_literal: Literal English translation of the Japanese article. (Task 1)
en_news-style: A translation of the original English article edited to match the content of the original Japanese article.
The order of information, vocabulary, and number of lines may differ from the original Japanese article. (Task 2)

en_original: Original English article. This is a report of the same news as the original Japanese article, intended for an international audience.
The content of this article may differ from the Japanese version. (Task 3)

tags_ja: Tag information in Japanese (e.g., topic or category)

{
"article_id": 202401010001,
"ja_original": "Japanese original title\nBody sentence 1. Body sentence 2. ...",
"en_literal": "English literal style title\nBody sentence 1. Body sentence 2. ...",
"en_news-style": "English article style title\nBody sentence 1. Body sentence 2. ...",
"en_original": "English original title\nBody sentence 1. Body sentence 2. ...",
"tags_ja": "地震 (Earthquake), 石川県 (Ishikawa Prefecture)",
}

Examples of each translation type (fiction news) :

Original Japanese article (Source) Literal Translation (Task 1) English Article Style Translation (Task 2) Original English article (Task 3)
東京で桜が満開に\n Cherry blossoms in full bloom in Tokyo\n Tokyo’s cherry blossoms reach full bloom\n Tokyo’s cherry blossoms reach full bloom\n
東京都内の各地で桜が見頃を迎え、多くの人々が花見を楽しんだ。今年の開花は例年より早いという。... Cherry blossoms are at their best in various parts of Tokyo, and many people enjoyed hanami.This year's blooming is earlier than usual. ... Tokyo, April 2 (xxxx Press)—This year, cherry blossoms bloomed earlier than usual, reached their peak across Tokyo, and many people enjoyed hanami (flower viewing) throughout the city. ... Tokyo, April 2 (xxxx Press)—This year, cherry blossoms bloomed earlier than usual, and the traditional hanami (flower viewing) festivities are in full swing. ...
The above translation examples are automatically generated by GPT-4.1.
All distributed data consists of original articles and translations created by human.

Evaluation

A jsonl file containing only Japanese original articles for test data will be distributed in mid-September.
Please store your model output in the specified key.

{
"article_id": 2024xxxxxxxx,
"ja_original": "Japanese original title\nBody sentence 1. Body sentence 2. ...",
"en_literal_output": "your model output",
"en_news-style_output": "your model output",
"en_original_output": "your model output",
}

Submissions will be evaluated using:

  • Automatic metrics: Document-level BLEU, COMET, etc
  • Human evaluation: Top 3 teams in Automatic evaluation

Regulation

  • Dataset must not be used for purposes other than this task.
  • Paid or closed-source models (e.g., GPT-4) are not allowed for submission.
  • Dev/Test data must not be used for learning.
  • Modification of training data and creation of synthetic data are allowed, provided that no data is leaked externally.

Agreement

Participants must sign a usage agreement with Jiji Press before accessing the source articles.

To obtain this article-level dataset and the Jiji 2020 data, follow these steps:

  1. Complete and sign the license agreement: English or Japanese. Please read the license agreement carefully.

  2. Scan and email the signed agreement to Jiji Press Ltd. (asaka -at- jiji.co.jp), and send the original copy of the agreement to the following address:

    English:

    ASAKA, Hidehiro
    Sports Business Promotion Office
    JIJI Press LTD.
    5-15-8 Ginza, Chuo-ku,
    Tokyo 104-8178, JAPAN

    Japanese:

    104-8178
    東京都中央区銀座5-15-8
    時事通信社スポーツ事業推進室
    朝賀英裕

  3. The organizers will email the link to download these corpora to the applicant, once Jiji Press Ltd. has received the original copy and approved the application. (Please note Jiji Press Ltd. will provide the e-mail address of the applicant to the organizers.)

Please anonymize any personal information when you include such text from the Jiji data in your papers and presentations.

IMPORTANT DATES

Shared Task Submission Deadline

October 20, 2025

System Description Paper for Shared Tasks Submission Deadline

November 10, 2025

Review Feedback of System Description Papers

November 12, 2025

Camera-ready Deadline

November 17, 2025

Workshop Dates

December 24, 2025

* All deadlines are calculated at 11:59PM UTC-12

ORGANIZERS

  • Naoto Shirai (shirai.n-hk -at- nhk.or.jp), NHK

  • Hitoshi Ito (itou.h-ce -at- nhk.or.jp), NHK

  • Kazutaka Kinugawa (kinugawa.k-jg -at- nhk.or.jp), NHK

  • Hideya Mino (mino.h-gq -at- nhk.or.jp), NHK

ACKNOWLEDGEMENTS

We are deeply grateful to Hidehiro Asaka and Takayuki Kawakami for providing the valuable data used in this research.

These research results were obtained from the commissioned research (No. 225) by National Institute of Information and Communications Technology (NICT), Japan.

CHANGE LOG

2025-09-01: Update Dataset Information

2025-08-22: Update Dataset Information

2025-08-06: Dev data release

2025-07-04: Site opened