References¶
Note
Please modify this document if anything is erroneous or not included. Last updated at Sep 25, 2017.
Korean morpheme analyzer tools¶
When you’re analyzing Korean text, the most basic task you need to perform is morphological analysis. There are several libraries in various programming languages to achieve this:
C/C++¶
- MeCab-ko (2013) - By Yong-woon Lee and Youngho Yoo GPL LGPL BSD
- UTagger (2012) - By Joon-Choul Shin, Cheol-Young Ock* (Ulsan University) GPL custom
- 신준철, 옥철영, 기분석 부분 어절 사전을 활용한 한국어 형태소 분석기 (A Korean Morphological Analyzer using a Pre-analyzed Partial Word-phrase Dictionary), 정보과학회논문지: 소프트웨어 및 응용, 제39권 제5호, 2012.
- 신준철, 옥철영, 한국어 품사 및 동형이의어 태깅을 위한 단계별 전이모델 (A Stage Transition Model for Korean Part-of-Speech and Homograph Tagging), 정보과학회논문지: 소프트웨어 및 응용, 제39권 제11호, 2012.
- slides
- MACH (2002) - By Kwangseob Shim (성신여대) custom
- Kwangseob Shim, Jaehyung Yang, MACH: A Supersonic Korean Morphological Analyzer, ACL, 2002.
- KTS (1995) - By 이상호, 서정연, 오영환 (KAIST) GPL v2
- 이상호, KTS: Korean Tagging System Manual (Version 0.9)
- 김재훈, 서정연, 자연언어 처리를 위한 한국어 품사 태그 (A Korean part-of-speech tag set for natural language processing), 1993.
- Created at 1995, released at 2002. [1]
Java/Scala¶
- twitter-korean-text (2014) - By Will Hohyon Ryu (Twitter) Apache v2
- KOMORAN (2013) - By 신준수 (shineware) Apache v2
- KKMA (2010) - By Sang-goo Lee*, Dongjoo Lee, et al. (Seoul National University) GPL v2
- 이동주, 연종흠, 황인범, 이상구, 꼬꼬마: 관계형 데이터베이스를 활용한 세종 말뭉치 활용 도구, 정보과학회논문지: 컴퓨팅의 실제 및 레터, Volume 16, No.11, 2010.
Python¶
Others¶
- K-LIWC (아주대)
- KRISTAL-IRMS (KISTI)
- Korean XTAG (UPenn)
- HAM (국민대)
- POSTAG/K (POSTECH)
Corpora¶
- Korea University Korean Corpus, 1995.
- 10M tokens of Korean of 1970-90s
- HANTEC 2.0, KISTI & 충남대, 1998-2003.
- 120,000 test documents (237MB)
- 50 TREC-type questions for QA (48KB)
- HKIB-40075, KISTI & 한국일보, 2002.
- 40,075 test documents for text categorization (88MB)
- KAIST Corpus, KAIST, 1997-2005.
- Sejong Corpus, National Institute of the Korean Language, 1998-2007.
- Yonsei Corpus, 연세대, 1987.
- 42M tokens of Korean since the 1960s
- BoRA 언어자원은행, KAIST
Other NLP tools¶
- Hangulize - By Heungsub Lee Python
- Hangul transcription tool to 38+ languages
- Hanja - By Sumin Byeon Python
- Hanja to hangul transcriptor
- Jamo - By Joshua Dong Python
- Hangul syllable decomposition and synthesis
- KoreanParser - By DongHyun Choi, Jungyeul Park, Key-Sun Choi (KAIST) Java
- Language parser
- Korean - By Heungsub Lee Python
- Package for attaching particles (josa) in sentences
- Speller (부산대)
[1] | https://wiki.kldp.org/wiki.php/KTS |