KoNLPy is a Python package for natural language processing (NLP) of the Korean language. For installation directions, see here.
>>> from konlpy.tag import Kkma >>> from konlpy.utils import pprint >>> kkma = Kkma() >>> pprint(kkma.sentences(u'저는 대학생이구요. 소프트웨어 관련학과 입니다.')) [저는 대학생이구요., 소프트웨어 관련학과 입니다.] >>> pprint(kkma.nouns(u'대학에서 DB, 통계학, 이산수학 등을 배웠지만...')) [대학, 통계학, 이산, 이산수학, 수학, 등] >>> pprint(kkma.pos(u'자주 사용을 안하다보니 모두 까먹은 상태입니다.')) [(자주, MAG), (사용, NNG), (을, JKO), (안하, VV), (다, ECS), (보, VXV), (니, ECD), (모두, MAG), (까먹, VV), (은, ETD), (상태, NNG), (이, VCP), (ㅂ니다, EFN), (., SF)]
For more on how to use KoNLPy, go see the API.
Korean, the 13th most widely spoken language in the world, is a beautiful, yet complex language. Myriad Korean NLP engines were built by numerous researchers, to computationally extract meaningful features from the labyrinthine text.
KoNLPy is not just to create another, but to unify and build upon their shoulders, and see one step further. It is built particularly in the Python (programming) language, not only because of the language’s simplicity and elegance, but also the powerful string processing modules and applicability to various tasks - including crawling, Web programming, and data analysis.
The three main philosophies of this project are:
Please report when you think any have gone stale.
|||With clear and brief documents.|
|||No, I’m not extremely fond of this either. However, some important depedencies - such as Hannanum, Kkma, MeCab-ko - are GPL licensed, and we want to honor their licenses. (It is also an inevitable choice. We hope things may change in the future.)|