JST_LOGO.JPG NICT_LOGO.JPG

ASPEC

(Asian Scientific Paper Excerpt Corpus)

The agreement of ASPEC was updated on 18th June, 2014 to be more accommodating to researchers working for companies.

INTRODUCTION

ASPEC, Asian Scientific Paper Excerpt Corpus, is constructed by the Japan Science and Technology Agency (JST) in collaboration with the National Institute of Information and Communications Technology (NICT). It consists of a Japanese-English paper abstract corpus of 3M parallel sentences (ASPEC-JE) and a Japanese-Chinese paper excerpt corpus of 680K parallel sentences (ASPEC-JC). This corpus is one of the achievements of the Japanese-Chinese machine translation project which was run in Japan from 2006 to 2010 (see this Japanese page). Before using ASPEC, please read and accept the terms of the following license agreement.

With the increasing number of scientific papers published worldwide, there is a demand for machine translation of scientific papers. ASPEC is the first parallel corpus to forcus on this. ASPEC aims to promote machine translation research in the domain of scientific papers.

DETAIL

ASPEC includes:

The number of sentences are as follows:

Parallel Corpus Data Type File Name Number of sentences
ASPEC-JE TRAIN train1.txt 1,000,000
train2.txt 1,000,000
train3.txt 1,008,500
DEV dev.txt 1,790
DEVTEST devtest.txt 1,784
TEST test.txt 1,812
ASPEC-JC TRAIN train.txt 672,315
DEV dev.txt 2,090
DEVTEST devtest.txt 2,148
TEST test.txt 2,107

ASPEC-JE was constructed from Japanese-English scientific paper abstracts, which are the property of the Japan Science and Technology Agency (JST). The National Institute of Information and Communications Technology (NICT) created the 1-to-1 sentence alignments using the method of (Utiyama and Isahara, MT summit XI, 2007).

ASPEC-JC was constructed by manually translating Japanese scientific papers into Chinese. The Japanese scientific papers are either the property of the Japan Science and Technology Agency (JST) or stored in Japan's Largest Electronic Journal Platform for Academic Societies (J-STAGE). The unit of manual translation is the paragraph, and the paragraphs are selected so as to maximize the coverage of word types.

CAUTION: This page aggregates the points where you had better to pay attention when you use this corpus.

WORKSHOP ON ASIAN TRANSLATION

Using ASPEC, a new open evaluation campaign for machine translation of scientific papers named WAT, Workshop on Asian Translation will be held on October, 2015.

HOW TO OBTAIN

  1. Complete and sign the license agreement
  2. Scan and email the signed agreement to JST (aspec -at- jst.go.jp), and also send the original copy of the agreement to the following address:

    OKAI, Masayuki
    Department of Information Planning
    Japan Science and Technology Agency
    5-3, Yonbancho, Chiyoda-ku,
    Tokyo 102-8666, Japan

    102-8666
    東京都千代田区四番町5-3
    サイエンスプラザ
    国立研究開発法人科学技術振興機構
    情報企画部 システム高度化グループ
    岡井 将之

  3. After receiving your email, JST will send the download link for the corpus

AGREEMENT

English (form, sample), Japanese (form, sample)

Please cite the following paper when you use the ASPEC.
@InProceedings{NAKAZAWA16.621,
  author = {Toshiaki Nakazawa and Manabu Yaguchi and Kiyotaka Uchimoto and Masao Utiyama and Eiichiro Sumita and Sadao Kurohashi and Hitoshi Isahara},
  title = {ASPEC: Asian Scientific Paper Excerpt Corpus},
  booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2016)},
  year = {2016},
  month = {may},
  date = {26-31},
  address = {Portorož, Slovenia},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-9-1},
  language = {english},
  pages = {2204-2208}
}
    

CONTACT

For questions, comments, etc., please send an email to "aspec -at- jst.go.jp".

CHANGE LOG

2015-02-26: updated the WAT information.
2014-06-19: updated the AGREEMENT to be more accommodating to researchers working for companies.
2014-03-07: added the announcement about the special treatment for applicants up to March 2014
2014-01-30: added the mailing address for the original copy of the agreement
2014-01-22: site open


JST (Japan Science and Technology Agency)
Last Modified: 2015-02-26