JST_LOGO.JPG NICT_LOGO.JPG KYOTO-U_LOGO.JPG

JPO Patent Corpus

[HOME]
The registration of the use of JPO Patent Corpus for WAT 2017 is opened (2017/6/12)

INTRODUCTION

JPO Patent Corpus was constructed by the Japan Patent Office (JPO). This corpus consists of a Chinese-Japanese, Korean-Japanese, and English-Japanese patent description corpus of 1M parallel sentences with four sections, which are Chemistry, Electricity, Mechanical engineering, and Physics, based on International Patent Classification (IPC).

DETAIL

JPO Patent corpus includes:

The numbers of sentences are as follows:

Parallel Corpus Data Type File Name Section Number of sentences
CJ TRAIN train.txt Chemistry 250,000
Electricity 250,000
Mechanical Engineering 250,000
Physics 250,000
DEV dev.txt Chemistry 500
Electricity 500
Mechanical Engineering 500
Physics 500
DEVTEST devtest.txt Chemistry 500
Electricity 500
Mechanical Engineering 500
Physics 500
TEST test.txt Chemistry 500
Electricity 500
Mechanical Engineering 500
Physics 500
KJ TRAIN train.txt Chemistry 250,000
Electricity 250,000
Mechanical Engineering 250,000
Physics 250,000
DEV dev.txt Chemistry 500
Electricity 500
Mechanical Engineering 500
Physics 500
DEVTEST devtest.txt Chemistry 500
Electricity 500
Mechanical Engineering 500
Physics 500
TEST test.txt Chemistry 500
Electricity 500
Mechanical Engineering 500
Physics 500
EJ TRAIN train.txt Chemistry 250,000
Electricity 250,000
Mechanical Engineering 250,000
Physics 250,000
DEV dev.txt Chemistry 500
Electricity 500
Mechanical Engineering 500
Physics 500
DEVTEST devtest.txt Chemistry 500
Electricity 500
Mechanical Engineering 500
Physics 500
TEST test.txt Chemistry 500
Electricity 500
Mechanical Engineering 500
Physics 500

JPO Patent Corpus was constructed from Chinese-Japanese, Korean-Japanese, and English-Japanese patent description sentence pairs.
The Data include the information assets that the JPO and the National Institute of Information and Communications Technology (NICT) create jointly based on the agreement between the JPO and the NICT.

For another patent corpus, there is NTCIR-10 PatentMT Research Purpose Use of Test Collection.

HOW TO OBTAIN

Back to top

AGREEMENT

English (form), Japanese (form)

Back to top

CONTACT

For questions, comments, etc. please email to "wat -at- nlp.ist.i.kyoto-u.ac.jp".

Back to top

CHANGE LOG

2017-6-12: updated (for WAT2017)
2016-6-13: site open


JST (Japan Science and Technology Agency)
NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2017-6-12