This page describes the cautions of the corpus.
Almost all the date expressions at the end of the Japanese
sentences do not have the corresponding expressions in the
English sentences.
e.g.
B-94A0894379 ||| 3 ||| 材料開発における発想支援のためには,ユーザの側の 操作が重要であるため,原子レベルで の物質操作のためのインターフェイスを 開発した[1994.8] ||| Because user operation is important for the idea support in material development, an interface for a substance operation at atomic level was developed.
The result of "grep '[19' file.txt -c" to train,
dev, devtest and test data are 1994, 17, 19 and 23 respectively.
For the evaluation
of WAT, the
date expressions which do not have the corresponding expressions
will be removed.
Japanese word "標題" appears several times in the dev, devtest and test data. However, it never appears in the training data, thus it always becomes OOV.
There are some incomplete Chinese sentences.
e.g.
NICT_JC_SP-IPSJ-JNL4312017-sec3.-par1-sen31 ||| の位置へ,中央の正規表現では”ki”にマッチする位置へ,最後の正規表現では”kik”にマッチするMigemoは,ユーザが1文字入力するごとに,指定された読みで始まる単語を正規表現に動的に展開してインクリメンタル検索を行う. ||| 。。
The result of "grep '||| 。。' file.txt -c" to train, dev, devtest and test data are 10, 0, 0 and 1 respectively.
For the evaluation
of WAT, these sentences will be removed.
2014-05-05: date expressions at the end of Japanese sentence
JST (Japan Science and Technology Agency)
Last Modified: 2014-05-15