References

Note

Please modify this document if anything is erroneous or not included. Last updated at October 24, 2015.

Korean morpheme analyzer tools

When you’re analyzing Korean text, the most basic task you need to perform is morphological analysis. There are several libraries in various programming languages to achieve this:

C/C++

Java/Scala

Python

  • KoNLPy (2014) GPL v3+
    • By Lucy Park (Seoul National University)
    • Wrapper for Hannanum, KKMA, KOMORAN, twitter-korean-text, MeCab-ko
    • Tools for Hangul/Korean manipulation
  • UMorpheme (2014) MIT
    • By Kyunghoon Kim (UNIST)
    • Wrapper for MeCab-ko for online usage

R

  • KoNLP (2011) GPL v3
    • By Heewon Jeon
    • Wrapper for Hannaum

Others

Corpora

  • Korea University Korean Corpus, 1995.
    • 10M tokens of Korean of 1970-90s
  • HANTEC 2.0, KISTI & 충남대, 1998-2003.
    • 120,000 test documents (237MB)
    • 50 TREC-type questions for QA (48KB)
  • HKIB-40075, KISTI & 한국일보, 2002.
    • 40,075 test documents for text categorization (88MB)
  • KAIST Corpus, KAIST, 1997-2005.

  • Sejong Corpus, National Institute of the Korean Language, 1998-2007.

  • Yonsei Corpus, 연세대, 1987.
    • 42M tokens of Korean since the 1960s
  • BoRA 언어자원은행, KAIST

Other NLP tools

  • Hangulize - By Heungsub Lee Python
    • Hangul transcription tool to 38+ languages
  • Hanja - By Sumin Byeon Python
    • Hanja to hangul transcriptor
  • Jamo - By Joshua Dong Python
    • Hangul syllable decomposition and synthesis
  • KoreanParser - By DongHyun Choi, Jungyeul Park, Key-Sun Choi (KAIST) Java
    • Language parser
  • Korean - By Heungsub Lee Python
    • Package for attaching particles (josa) in sentences
  • go_hangul (2012) - By Homin Lee Go BSD
    • Tools for Hangul manipulation [docs]
  • Speller (부산대)

[1]https://wiki.kldp.org/wiki.php/KTS