KoNLPy: Korean NLP in Python

Build status Documentation Status

KoNLPy (pronounced “ko en el PIE”) is a Python package for natural language processing (NLP) of the Korean language. For installation directions, see here.

For users new to NLP, go to Getting started. For step-by-step instructions, follow the User guide. For specific descriptions of each module, go see the API documents.

>>> from konlpy.tag import Kkma
>>> from konlpy.utils import pprint
>>> kkma = Kkma()
>>> pprint(kkma.sentences(u'네, 안녕하세요. 반갑습니다.'))
[네, 안녕하세요..,
 반갑습니다.]
>>> pprint(kkma.nouns(u'질문이나 건의사항은 깃헙 이슈 트래커에 남겨주세요.'))
[질문,
 건의,
 건의사항,
 사항,
 깃헙,
 이슈,
 트래커]
>>> pprint(kkma.pos(u'오류보고는 실행환경, 에러메세지와함께 설명을 최대한상세히!^^'))
[(오류, NNG),
 (보고, NNG),
 (는, JX),
 (실행, NNG),
 (환경, NNG),
 (,, SP),
 (에러, NNG),
 (메세지, NNG),
 (와, JKM),
 (함께, MAG),
 (설명, NNG),
 (을, JKO),
 (최대한, NNG),
 (상세히, MAG),
 (!, SF),
 (^^, EMO)]

Standing on the shoulders of giants

Korean, the 13th most widely spoken language in the world, is a beautiful, yet complex language. Myriad Korean morpheme analyzer tools were built by numerous researchers, to computationally extract meaningful features from the labyrinthine text.

KoNLPy is not just to create another, but to unify and build upon their shoulders, and see one step further. It is built particularly in the Python (programming) language, not only because of the language’s simplicity and elegance, but also the powerful string processing modules and applicability to various tasks - including crawling, Web programming, and data analysis.

The three main philosophies of this project are:

Please report when you think any have gone stale.

License

KoNLPy is Open Source Software, and is released under the license below:

You are welcome to use the code under the terms of the license, however please acknowledge its use with a citation.

Here is a BibTeX entry.:

@inproceedings{park2014konlpy,
  title={KoNLPy: Korean natural language processing in Python},
  author={Park, Eunjeong L. and Cho, Sungzoon},
  booktitle={Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology},
  address={Chuncheon, Korea},
  month={October},
  year={2014}
}

Contribute

KoNLPy isn’t perfect, but it will continuously evolve and you are invited to participate!

Found a bug? Have a good idea for improving KoNLPy? Visit the KoNLPy GitHub page and suggest an idea or make a pull request.

You are also welcome to join the #koreannlp channel at the Ozinger IRC Network, and the mailing list. The IRC channel is more focused on development discussions and the mailing list is a better place to ask questions, but nobody stops you from going the other way around.

Please note that asking questions through these channels is also a great contribution, because it gives the community feedback as well as ideas. Don’t hesitate to ask.

Indices and tables

[1]With clear and brief documents.
[2]No, I’m not extremely fond of this either. However, some important depedencies - such as Hannanum, Kkma, MeCab-ko - are GPL licensed, and we want to honor their licenses. (It is also an inevitable choice. We hope things may change in the future.)
comments powered by Disqus
Fork me on GitHub

KoNLPy is a Python package for Korean natural language processing.

Table Of Contents

Donate

If you love KoNLPy, consider supporting on Gratipay:

Translations

Useful Links