For all subtasks
For English tokenization
For Japanese segmentation
For Chinese segmentation
For Korean segmentation
- mecab-ko
(https://bitbucket.org/eunjeon/mecab-ko/downloads/mecab-0.996-ko-0.9.1.tar.gz)
(https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-1.6.1-20140814.tar.gz)
Install procedure
For Hindi segmentation
For character conversion
For extracting texts from recipe JSON files
Java
- Version 7 Update 60
(JDK,
JRE)
(For Stanford Word Segmenter)
NICT (National Institute of Information and Communications Technology)
Kyoto University
Last Modified: 2020-07-08