Tokyo Stock Exchange is one of the largest capital markets in the world where over 3,600 companies are listed as of 2018 year end. Companies are obliged to disclose material information including financial statements, corporate actions or corporate governance policies to the public in a timely manner. Those 'timely disclosure documents' are important basis of investment decision.
Global investors have invested in Japanese companies and now consist of 30-40% shareholding. Although Japanese original documents are disclosed tens of thousands a year (77,000 documents in 2018), English disclosure documents are still limited in availability. There will be strong demand for machine translation from both listed companies and global investors because Japanese-English translation needs to be done in timely manner.
'Timely Disclosure Documents Corpus' was constructed by Japan Exchange Group (JPX) and provided for WAT to encourage developments of machine translation. The corpus, made from past timely disclosure documents, consists of 1.4M parallel sentences of Japanese and English.
Timely disclosure documents contain important figures (e.g. sales, profits, dates) and proper nouns (e.g. name of the person, place, company, business and product). These are critical information for investors so mistranslations should be avoided and overall translation quality should be improved.
You can see the original 'Timely Disclosure Documents' below:
The samples of this corpus are as follows:
|株式会社日本取引所グループ||Japan Exchange Group, Inc.|
|業績予想及び配当予想の修正に関するお知らせ||Notice of Revision to Earnings Forecast and Dividend Forecast|
|当社は、2017年10月30日に開示しました2018年3月期(2017年4月1日〜2018年3月31日)の通期連結業績予想及び期末の1株当たり配当予想について、下記のとおり修正することとしましたので、お知らせいたします。||We hereby announce that the consolidated earnings forecast and year-end dividend forecast for the fiscal year ending March 31, 2018 released on October 30, 2017 have been revised as follows.|
|剰余金の配当に関するお知らせ||Notice of Dividend from Surplus|
|これにより、2018年3月期の期末の1株当たり配当金は、普通配当33円に加え、記念配当10円を合わせた43円となります。||As a result, the year-end dividend per share for the fiscal year ended March 31, 2018 will be ¥43 (ordinary dividend of ¥33 plus commemorative dividend of ¥10).|
|投資活動によるキャッシュ・フローは、無形資産の取得による支出105億37百万円等により、261億64百万円の支出となりました。||There was cash outflow of ¥26,164 million from investment activities due mainly to ¥10,537 million in purchase of intangible assets.|
|発行済株式数に占める当社保有株式の比率||Shareholding ratio of JPX|
|SGXが保有する自己株式(515,063株)を含む。||Including the shares held by SGX as treasury stock (515,063 shares).|
If you need more samples, you can obtain them from here.
Timely Disclosure Documents Corpus includes:
The numbers of sentences are as follows:
|Data Type||File Name||Number of sentences||Number of unique pairs||Number of original documents|
Datasets of DEV and TEST contain sentences that focus on the translation quality of proper nouns and figures.
Further details are as follows:
|Language pair||Japanese - English|
|Source documents||Timely Disclosure Documents (16,292 documents)|
|Author of Source documents||Companies listed on Tokyo Stock Exchange|
|Disclosure date of Source documents||January 2016 to June 2018|
|Sort order of sentences||In no particular order|
NOTE: This section aggregates important points in using this corpus.
For questions, comments, etc. please email to "wat-organizer -at- googlegroups -dot- com".
2019-06-21: update DETAILS
2019-05-11: site open
NICT (National Institute of Information and Communications Technology)
Last Modified: 2019-05-11