참고문헌¶

주석

Please modify this document if anything is erroneous or not included. Last updated at 2015년 10월 24일.

한국어 형태소 분석기¶

한국어 텍스트를 분석할 때 가장 기본적으로 행해야하는 것은 형태소 분석입니다. 이를 위해 다양한 프로그래밍 언어로 된 여러 라이브러리가 있습니다:

MeCab-ko (2013) - By Yong-woon Lee and Youngho Yoo GPL LGPL BSD
UTagger (2012) - By Joon-Choul Shin, Cheol-Young Ock* (Ulsan University) GPL custom
- 신준철, 옥철영, 기분석 부분 어절 사전을 활용한 한국어 형태소 분석기 (A Korean Morphological Analyzer using a Pre-analyzed Partial Word-phrase Dictionary), 정보과학회논문지: 소프트웨어 및 응용, 제39권 제5호, 2012.
- 신준철, 옥철영, 한국어 품사 및 동형이의어 태깅을 위한 단계별 전이모델 (A Stage Transition Model for Korean Part-of-Speech and Homograph Tagging), 정보과학회논문지: 소프트웨어 및 응용, 제39권 제11호, 2012.
- slides
MACH (2002) - By Kwangseob Shim (성신여대) custom
- Kwangseob Shim, Jaehyung Yang, MACH: A Supersonic Korean Morphological Analyzer, ACL, 2002.
KTS (1995) - By 이상호, 서정연, 오영환 (KAIST) GPL v2
- 이상호, KTS: Korean Tagging System Manual (Version 0.9)
- 김재훈, 서정연, 자연언어 처리를 위한 한국어 품사 태그 (A Korean part-of-speech tag set for natural language processing), 1993.
- Created at 1995, released at 2002. [1]

twitter-korean-text (2014) - By Will Hohyon Ryu (Twitter) Apache v2
KOMORAN (2013) - By 신준수 (shineware) Apache v2
KKMA (2010) - By Sang-goo Lee*, Dongjoo Lee, et al. (Seoul National University) GPL v2
- 이동주, 연종흠, 황인범, 이상구, 꼬꼬마: 관계형 데이터베이스를 활용한 세종 말뭉치 활용 도구, 정보과학회논문지: 컴퓨팅의 실제 및 레터, Volume 16, No.11, 2010.
Arirang (2009) - By SooMyung Lee Apache v2
- code
HanNanum (1999) - By Key-Sun Choi* et al. (KAIST) GPL v3
- code, docs

KoNLPy (2014) GPL v3+
- By Lucy Park (Seoul National University)
- Wrapper for Hannanum, KKMA, KOMORAN, twitter-korean-text, MeCab-ko
- Tools for Hangul/Korean manipulation
UMorpheme (2014) MIT
- 김경훈 (UNIST)
- Wrapper for MeCab-ko for online usage

고려대학교 한국어 말뭉치, 1995
- 1970-90년대 한국어에 대한 1000만 어절
HANTEC 2.0, KISTI & 충남대, 1998-2003.
- 12만 개의 테스트 문서 (237MB)
- QA를 위한 50개의 TREC 형태 질의
HKIB-40075, KISTI & 한국일보, 2002.
- 텍스트 분류를 위한 40,075 테스트 문서 (88MB)
KAIST Corpus, KAIST, 1997-2005.
Sejong Corpus, National Institute of the Korean Language, 1998-2007.
연세 말뭉치, 연세대, 1987.
- 1960년 이후 한국어에 대한 4200만 어절
BoRA 언어자원은행, KAIST

Hangulize - By Heungsub Lee Python
- Hangul transcription tool to 38+ languages
Hanja - By Sumin Byeon Python
- Hanja to hangul transcriptor
Jamo - By Joshua Dong Python
- Hangul syllable decomposition and synthesis
KoreanParser - By DongHyun Choi, Jungyeul Park, Key-Sun Choi (KAIST) Java
- 언어 파서
Korean - By Heungsub Lee Python
- Package for attaching particles (josa) in sentences
go_hangul (2012) - By Homin Lee Go BSD
- Tools for Hangul manipulation [docs]
Speller (부산대)

[1]	https://wiki.kldp.org/wiki.php/KTS