NICT_LOGO.JPG KYOTO-U_LOGO.JPG

JPO Patent Corpus for WAT2019

[HOME]

INTRODUCTION

JPO Patent Corpus was constructed by the Japan Patent Office (JPO). This corpus consists of a Chinese-Japanese, Korean-Japanese, and English-Japanese patent description corpus of 1M parallel sentences with four sections, which are Chemistry (Ch), Electricity (El), Mechanical engineering (Me), and Physics (Ph), based on International Patent Classification (IPC).

DETAIL

Datasets for Normal Tasks

These tasks evaluate performance of a translation model for each language pair. Differing from the previous patent tasks at WAT2016-2017, new test sets are added as follows:

Corpus statistics:

Language Pair Data Type File Name Size Sections:Ratios Published Years Sentence Alignment
ZH<-->JA TRAIN train.{zh,ja} 250,000 Ch/El/Me/Ph:25%/25%/25%/25% 2011-2013 Automatic
DEV dev.{zh,ja} 2,000
DEVTEST devtest.{zh,ja} 2,000
TEST test-n1.{zh,ja} 2,000
TEST test-n2.{zh,ja} 3,000 Ch/El/Me/Ph:Unknown 2016-2017 Manual
TEST test-n3.{zh,ja} 204
TEST test-n.{zh,ja} 5,204 2011-2013, 2016-2017 Automatic/Manual
KO<-->JA TRAIN train.{ko,ja} 250,000 Ch/El/Me/Ph:25%/25%/25%/25% 2011-2013 Automatic
DEV dev.{ko,ja} 2,000
DEVTEST devtest.{ko,ja} 2,000
TEST test-n1.{ko,ja} 2,000
TEST test-n2.{ko,ja} 3,000 Ch/El/Me/Ph:Unknown 2016-2017 Manual
TEST test-n3.{ko,ja} 230
TEST test-n.{ko,ja} 5,230 2011-2013, 2016-2017 Automatic/Manual
EN<-->JA TRAIN train.{en,ja} 250,000 Ch/El/Me/Ph:25%/25%/25%/25% 2011-2013 Automatic
DEV dev.{en,ja} 2,000
DEVTEST devtest.{en,ja} 2,000
TEST test-n1.{en,ja} 2,000
TEST test-n2.{en,ja} 3,000 Ch/El/Me/Ph:Unknown 2016-2017 Manual
TEST test-n3.{en,ja} 668
TEST test-n.{en,ja} 5,668 2011-2013, 2016-2017 Automatic/Manual

Datasets for Expression Pattern Task

This task evaluates performance of a translation model for each predifined category of expression patterns, which corresponds to title of invention (TIT), abstract (ABS), scope of claim (CLM) or description (DES). Train/dev/devtest sets are the same data as those of the normal C<-->J tasks. Test set of this task consists of sentences each of which is annotated with a corresponding category of expression patterns.

Corpus statistics:

Language Pair Data Type File Name Size Sections Published Years
ZH->JA TEST test-ep.{zh,ja} 1,151 Ch/El/Me/Ph 2011-2013

HOW TO OBTAIN

  1. Complete and sign the license agreement (English/Japanese).
  2. Scan and email the signed agreement to the Japan Patent Office (PA0630 -at- jpo.go.jp), and also send the original of the agreement to the following address by mail:

    Patent Information Policy Planning Office
    General Coordination Division
    Japan Patent Office
    3-4-3 Kasumigaseki Chiyoda-ku,
    Tokyo 100-8915, Japan

    100-8915
    東京都千代田区霞が関3-4-3
    特許庁総務部総務課 情報技術統括室
    特許情報利用推進班

  3. WAT organizers/JPO staffs will email to notify the applicant of a link to download this corpus, once the JPO receives the original of the agreement and approves the application.
Back to top

CONTACT

For questions, comments, etc. please email to "wat-organizer -at- googlegroups -dot- com".

Back to top

CHANGE LOG

2019-5-10: site opened


NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2019-5-10