Introduction
This task focuses on translating news articles from Japanese into English.
As part of this article-level translation task, we will evaluate three types of English translations derived from the original Japanese article:
Task 1. Literal translation
Task 2. English Article Style Translation
Task 3. Reconstruction of the original English article
The English translations generated from Japanese articles are evaluated using both automatic and human evaluation for each task.
Dataset
We publish human translated an article-level dataset for WAT2025:
- Train: 227 articles
- Dev: 50 articles
- Test: 100 articles (to be distributed in mid-September)
In addition, it provides previously distributed a sentence-level dataset (Jiji 2020):
- Train: 200,000 sentences
- Dev: 479 sentences
- Test: 1,851 sentences
Dataset Details
Our tasks created a dataset from Jiji Press articles related to weather and disasters in 2024.
The distributed dataset is in jsonl format. Each line contains the following keys:
ja_original: Original Japanese article. (Source)
en_literal: Literal English translation of the Japanese article. (Task 1)
en_news-style: A translation of the original English article edited to match the content of the original Japanese article.
The order of information, vocabulary, and number of lines may differ from the original Japanese article. (Task 2)
en_original: Original English article. This is a report of the same news as the original Japanese article, intended for an international audience.
The content of this article may differ from the Japanese version. (Task 3)
tags_ja: Tag information in Japanese (e.g., topic or category)
{
"article_id": 202401010001,
"ja_original": "Japanese original title\nBody sentence 1. Body sentence 2. ...",
"en_literal": "English literal style title\nBody sentence 1. Body sentence 2. ...",
"en_news-style": "English article style title\nBody sentence 1. Body sentence 2. ...",
"en_original": "English original title\nBody sentence 1. Body sentence 2. ...",
"tags_ja": "地震 (Earthquake), 石川県 (Ishikawa Prefecture)",
}
Examples of each translation type (fiction news) :
| Original Japanese article (Source) | Literal Translation (Task 1) | English Article Style Translation (Task 2) | Original English article (Task 3) |
|---|---|---|---|
| 東京で桜が満開に\n | Cherry blossoms in full bloom in Tokyo\n | Tokyo’s cherry blossoms reach full bloom\n | Tokyo’s cherry blossoms reach full bloom\n |
| 東京都内の各地で桜が見頃を迎え、多くの人々が花見を楽しんだ。今年の開花は例年より早いという。... | Cherry blossoms are at their best in various parts of Tokyo, and many people enjoyed hanami.This year's blooming is earlier than usual. ... | Tokyo, April 2 (xxxx Press)—This year, cherry blossoms bloomed earlier than usual, reached their peak across Tokyo, and many people enjoyed hanami (flower viewing) throughout the city. ... | Tokyo, April 2 (xxxx Press)—This year, cherry blossoms bloomed earlier than usual, and the traditional hanami (flower viewing) festivities are in full swing. ... |
All distributed data consists of original articles and translations created by human.
Evaluation
Please store your model output in the specified key.
{
"article_id": 2024xxxxxxxx,
"ja_original": "Japanese original title\nBody sentence 1. Body sentence 2. ...",
"en_literal_output": "your model output",
"en_news-style_output": "your model output",
"en_original_output": "your model output",
}
Submissions will be evaluated using:
- Automatic metrics: Document-level BLEU, COMET, etc
- Human evaluation: Top 3 teams in Automatic evaluation
Regulation
- Dataset must not be used for purposes other than this task.
- Paid or closed-source models (e.g., GPT-4) are not allowed for submission.
- Dev/Test data must not be used for learning.
- Modification of training data and creation of synthetic data are allowed, provided that no data is leaked externally.
Agreement
Participants must sign a usage agreement with Jiji Press before accessing the source articles.
To obtain this article-level dataset and the Jiji 2020 data, follow these steps:
-
Complete and sign the license agreement: English or Japanese. Please read the license agreement carefully.
-
Scan and email the signed agreement to Jiji Press Ltd. (asaka -at- jiji.co.jp), and send the original copy of the agreement to the following address:
English:
ASAKA, Hidehiro
Sports Business Promotion Office
JIJI Press LTD.
5-15-8 Ginza, Chuo-ku,
Tokyo 104-8178, JAPAN
Japanese:
104-8178
東京都中央区銀座5-15-8
時事通信社スポーツ事業推進室
朝賀英裕
-
The organizers will email the link to download these corpora to the applicant, once Jiji Press Ltd. has received the original copy and approved the application. (Please note Jiji Press Ltd. will provide the e-mail address of the applicant to the organizers.)
Please anonymize any personal information when you include such text from the Jiji data in your papers and presentations.
IMPORTANT DATES
|
Shared Task Submission Deadline |
October 20, 2025 |
|
System Description Paper for Shared Tasks Submission Deadline |
November 10, 2025 |
|
Review Feedback of System Description Papers |
November 12, 2025 |
|
Camera-ready Deadline |
November 17, 2025 |
|
Workshop Dates |
December 24, 2025 |
|
* All deadlines are calculated at 11:59PM UTC-12 |
ORGANIZERS
-
Naoto Shirai (shirai.n-hk -at- nhk.or.jp), NHK
-
Hitoshi Ito (itou.h-ce -at- nhk.or.jp), NHK
-
Kazutaka Kinugawa (kinugawa.k-jg -at- nhk.or.jp), NHK
-
Hideya Mino (mino.h-gq -at- nhk.or.jp), NHK
ACKNOWLEDGEMENTS
We are deeply grateful to Hidehiro Asaka and Takayuki Kawakami for providing the valuable data used in this research.
These research results were obtained from the commissioned research (No. 225) by National Institute of Information and Communications Technology (NICT), Japan.
CHANGE LOG
2025-09-01: Update Dataset Information
2025-08-22: Update Dataset Information
2025-08-06: Dev data release
2025-07-04: Site opened