konlpy Package

data Module


Find the path of a given resource URL by searching through directories in konlpy.data.path. If the given resource is not found, raise a LookupError, whose message gives a pointer to the installation instructions for konlpy.download().

Parameters:resource_url (str) – The URL of the resource to search for. URLs are posix-style relative path names, such as corpora/kolaw. In particular, directory names should always be separated by the forward slash character (i.e., ‘/’), which will be automatically converted to a platform-appropriate path separator by KoNLPy.
konlpy.data.load(resource_url, format='auto')

Load a given resource from the KoNLPy data package. If no format is specified, load() will attempt to determine a format based on the resource name’s file extension. If that fails, load() will raise a ValueError exception.

  • resource_url (str) – A URL specifying where the resource should be loaded from.
  • format – Format type of resource.
konlpy.data.path = ['/home/docs/konlpy_data', '/usr/share/konlpy_data', '/usr/local/share/konlpy_data', '/usr/lib/konlpy_data', '/usr/local/lib/konlpy_data', '/home/docs/checkouts/readthedocs.org/user_builds/konlpy/checkouts/v0.4.4/konlpy/data']

A list of directories where the KoNLPy data package might reside. These directories will be checked in order when looking for a resource. Note that this allows users to substitute their own versions of resources.

class konlpy.data.FileSystemPathPointer(path)

Bases: konlpy.data.PathPointer, str

A path pointer that identifies a file by an absolute path.

class konlpy.data.PathPointer

Bases: object

An abstract base class for path pointers. One subclass exists: 1. FileSystemPathPointer: Identifies a file by an absolute path.


downloader Module

class konlpy.downloader.Downloader(download_dir=None)

Bases: object

A class used to access the KoNLPy data server, which can be used to download packages.

INDEX_URL = 'http://konlpy.github.io/konlpy-data/index.json'
INSTALLED = 'installed'
NOT_INSTALLED = 'not installed'
PACKAGE_URL = 'http://konlpy.github.io/konlpy-data/packages/%s.%s'
SCRIPT_URL = 'http://konlpy.github.io/konlpy-data/packages/%s.sh'
STALE = 'corrupt or out of date'
download(id=None, download_dir=None)

The KoNLPy data downloader. With this module you can download corpora, models and other data packages that can be used with KoNLPy.

Individual packages can be downloaded by passing a single argument, the package identifier for the package that should be downloaded:

>>> download('corpus/kobill')
[konlpy_data] Downloading package 'kobill'...
[konlpy_data]   Unzipping corpora/kobill.zip.

To download all packages, simply call download with the argument ‘all’:

>>> download('all')
[konlpy_data] Downloading package 'kobill'...
[konlpy_data]   Unzipping corpora/kobill.zip.
status(info_or_id=None, download_dir=None)

Returns the directory to which packages will be downloaded by default. This value can be overriden using the constructor, or on a case-by-case basis using the download_dir argument when calling download().

On Windows, the default download directory is PYTHONHOME/lib/konlpy, where PYTHONHOME is the directory containing Python e.g., C:\Python27.

On all other platforms, the default directory is the first of the following which exists or which can be created with write permission: /usr/share/konlpy_data, /usr/local/share/konlpy_data, /usr/lib/konlpy_data, /usr/local/lib/konlpy_data, ~/konlpy_data.

jvm Module


Initializes the Java virtual machine (JVM).

Parameters:jvmpath – The path of the JVM. If left empty, inferred by jpype.getDefaultJVMPath().

utils Module

class konlpy.utils.UnicodePrinter(indent=1, width=80, depth=None, stream=None)

Bases: pprint.PrettyPrinter

format(object, context, maxlevels, level)

Overrided method to enable Unicode pretty print.


Converts a unicode character to hex.

>>> char2hex(u'음')
konlpy.utils.concordance(phrase, text, show=False)

Find concordances of a phrase in a text.

The farmost left numbers are indices, that indicate the location of the phrase in the text (by means of tokens). The following string, is part of the text surrounding the phrase for the given index.

  • phrase – Phrase to search in the document.
  • text – Target document.
  • show – If True, shows locations of the phrase on the console.
>>> from konlpy.corpus import kolaw
>>> from konlpy.tag import Mecab
>>> from konlpy import utils
>>> constitution = kolaw.open('constitution.txt').read()
>>> idx = utils.concordance(u'대한민국', constitution, show=True)
0       대한민국헌법 유구한 역사와
9       대한국민은 3·1운동으로 건립된 대한민국임시정부의 법통과 불의에
98      총강 제1조 ① 대한민국은 민주공화국이다. ②대한민국의
100     ① 대한민국은 민주공화국이다. ②대한민국의 주권은 국민에게
110     나온다. 제2조 ① 대한민국의 국민이 되는
126     의무를 진다. 제3조 대한민국의 영토는 한반도와
133     부속도서로 한다. 제4조 대한민국은 통일을 지향하며,
147     추진한다. 제5조 ① 대한민국은 국제평화의 유지에
787     군무원이 아닌 국민은 대한민국의 영역안에서는 중대한
1836    파견 또는 외국군대의 대한민국 영역안에서의 주류에
3620    경제 제119조 ① 대한민국의 경제질서는 개인과
>>> idx
[0, 9, 98, 100, 110, 126, 133, 147, 787, 1836, 3620]
konlpy.utils.csvread(f, encoding=u'utf-8')

Reads a csv file.

Parameters:f – File object.
>>> from konlpy.utils import csvread
>>> with open('some.csv', 'r') as f:
        print csvread(f)
[[u'이 / NR', u'차 / NNB'], [u'나가 / VV', u'네 / EFN']]
konlpy.utils.csvwrite(data, f)

Writes a csv file.

Parameters:data – A list of list.
>>> from konlpy.utils import csvwrite
>>> d = [[u'이 / NR', u'차 / NNB'], [u'나가 / VV', u'네 / EFN']]
>>> with open('some.csv', 'w') as f:
        csvwrite(d, f)

Converts a hex character to unicode.

>>> print hex2char('c74c')

>>> print hex2char('0xc74c')

konlpy.utils.load_txt(filename, encoding=u'utf-8')

Text file loader. To read a file, use ``read_txt()``instead.

konlpy.utils.partition(list_, indices)

Partitions a list to several parts using indices.

  • list – The target list.
  • indices – Indices to partition the target list.

Unicode pretty printer.

>>> import pprint, konlpy
>>> pprint.pprint([u"Print", u"유니코드", u"easily"])
[u'Print', u'유니코드', u'easily']
>>> konlpy.utils.pprint([u"Print", u"유니코드", u"easily"])
['Print', '유니코드', 'easily']
konlpy.utils.read_json(filename, encoding=u'utf-8')

JSON file reader.

konlpy.utils.read_txt(filename, encoding=u'utf-8')

Text file reader.


Replaces some ambiguous punctuation marks to simpler ones.