konlpy Package

data Module


Find the path of a given resource URL by searching through directories in konlpy.data.path. If the given resource is not found, raise a LookupError, whose message gives a pointer to the installation instructions for konlpy.download().

Parameters:resource_url (str) – The URL of the resource to search for. URLs are posix-style relative path names, such as corpora/kolaw. In particular, directory names should always be separated by the forward slash character (i.e., ‘/’), which will be automatically converted to a platform-appropriate path separator by KoNLPy.
konlpy.data.load(resource_url, format='auto')

Load a given resource from the KoNLPy data package. If no format is specified, load() will attempt to determine a format based on the resource name’s file extension. If that fails, load() will raise a ValueError exception.

  • resource_url (str) – A URL specifying where the resource should be loaded from.
  • format – Format type of resource.

list konlpy default data directory.

>>> import konlpy
>>> konlpy.listdir()

clear the konlpy output data directory

>>> import konlpy
>>> konlpy.clear()
konlpy.data.path = ['/home/docs/konlpy_data', '/usr/share/konlpy_data', '/usr/local/share/konlpy_data', '/usr/lib/konlpy_data', '/usr/local/lib/konlpy_data', '/home/docs/checkouts/readthedocs.org/user_builds/konlpy/checkouts/latest/konlpy/data']

A list of directories where the KoNLPy data package might reside. These directories will be checked in order when looking for a resource. Note that this allows users to substitute their own versions of resources.

class konlpy.data.FileSystemPathPointer(path)

Bases: konlpy.data.PathPointer, str

A path pointer that identifies a file by an absolute path.

class konlpy.data.PathPointer

Bases: object

An abstract base class for path pointers. One subclass exists: 1. FileSystemPathPointer: Identifies a file by an absolute path.

class konlpy.data.CorpusReader(extension='.txt')

Bases: object


read method reads all files included in items attr and save it into corpus dictionary.

class konlpy.data.StringWriter(filename)

Bases: object


downloader Module

class konlpy.downloader.Downloader(download_dir=None)

Bases: object

A class used to access the KoNLPy data server, which can be used to download packages.

INDEX_URL = 'http://konlpy.github.io/konlpy-data/index.json'
INSTALLED = 'installed'
NOT_INSTALLED = 'not installed'
PACKAGE_URL = 'http://konlpy.github.io/konlpy-data/packages/%s.%s'
SCRIPT_URL = 'http://konlpy.github.io/konlpy-data/packages/%s.sh'
STALE = 'corrupt or out of date'
download(id=None, download_dir=None)

The KoNLPy data downloader. With this module you can download corpora, models and other data packages that can be used with KoNLPy.

Individual packages can be downloaded by passing a single argument, the package identifier for the package that should be downloaded:

>>> download('corpus/kobill')
[konlpy_data] Downloading package 'kobill'...
[konlpy_data]   Unzipping corpora/kobill.zip.

To download all packages, simply call download with the argument ‘all’:

>>> download('all')
[konlpy_data] Downloading package 'kobill'...
[konlpy_data]   Unzipping corpora/kobill.zip.
status(info_or_id=None, download_dir=None)

Returns the directory to which packages will be downloaded by default. This value can be overriden using the constructor, or on a case-by-case basis using the download_dir argument when calling download().

On Windows, the default download directory is PYTHONHOME/lib/konlpy, where PYTHONHOME is the directory containing Python e.g., C:\Python27.

On all other platforms, the default directory is the first of the following which exists or which can be created with write permission: /usr/share/konlpy_data, /usr/local/share/konlpy_data, /usr/lib/konlpy_data, /usr/local/lib/konlpy_data, ~/konlpy_data.

jvm Module

konlpy.jvm.init_jvm(jvmpath=None, max_heap_size=1024)

Initializes the Java virtual machine (JVM).

  • jvmpath – The path of the JVM. If left empty, inferred by jpype.getDefaultJVMPath().
  • max_heap_size – Maximum memory usage limitation (Megabyte). Default is 1024 (1GB). If you set this value too small, you may got out of memory. We recommend that you set it 1024 ~ 2048 or more at least. However, if this value is too large, you may see inefficient memory usage.

utils Module

class konlpy.utils.PropagatingThread(group=None, target=None, name=None, args=(), kwargs=None, *, daemon=None)

Bases: threading.Thread

PropagatingThread is just a fancy wrapper for Thread to manage exceptions.

self.exception: Exception defined in higher-level.
self.ret: Thread target object.

Wait until the thread terminates.

This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception or until the optional timeout occurs.

When the timeout argument is present and not None, it should be a floating point number specifying a timeout for the operation in seconds (or fractions thereof). As join() always returns None, you must call is_alive() after join() to decide whether a timeout happened – if the thread is still alive, the join() call timed out.

When the timeout argument is not present or None, the operation will block until the thread terminates.

A thread can be join()ed many times.

join() raises a RuntimeError if an attempt is made to join the current thread as that would cause a deadlock. It is also an error to join() a thread before it has been started and attempts to do so raises the same exception.


Method representing the thread’s activity.

You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.


Converts a unicode character to hex.

>>> char2hex(u'음')
konlpy.utils.concordance(phrase, text, show=False)

Find concordances of a phrase in a text.

The farmost left numbers are indices, that indicate the location of the phrase in the text (by means of tokens). The following string, is part of the text surrounding the phrase for the given index.

  • phrase – Phrase to search in the document.
  • text – Target document.
  • show – If True, shows locations of the phrase on the console.
>>> from konlpy.corpus import kolaw
>>> from konlpy.tag import Mecab
>>> from konlpy import utils
>>> constitution = kolaw.open('constitution.txt').read()
>>> idx = utils.concordance(u'대한민국', constitution, show=True)
0       대한민국헌법 유구한 역사와
9       대한국민은 3·1운동으로 건립된 대한민국임시정부의 법통과 불의에
98      총강 제1조 ① 대한민국은 민주공화국이다. ②대한민국의
100     ① 대한민국은 민주공화국이다. ②대한민국의 주권은 국민에게
110     나온다. 제2조 ① 대한민국의 국민이 되는
126     의무를 진다. 제3조 대한민국의 영토는 한반도와
133     부속도서로 한다. 제4조 대한민국은 통일을 지향하며,
147     추진한다. 제5조 ① 대한민국은 국제평화의 유지에
787     군무원이 아닌 국민은 대한민국의 영역안에서는 중대한
1836    파견 또는 외국군대의 대한민국 영역안에서의 주류에
3620    경제 제119조 ① 대한민국의 경제질서는 개인과
>>> idx
[0, 9, 98, 100, 110, 126, 133, 147, 787, 1836, 3620]

Delete links from input string

string (str): string to delete links
str: string without links

Delete at marks from input string

string (str): string to delete at marks
str: string without at marks.

Converts a hex character to unicode.

>>> print hex2char('c74c')

>>> print hex2char('0xc74c')

konlpy.utils.load_txt(filename, encoding='utf-8')

Text file loader. To read a file, use ``read_txt()``instead.

konlpy.utils.partition(list_, indices)

Partitions a list to several parts using indices.

  • list – The target list.
  • indices – Indices to partition the target list.
konlpy.utils.read_json(filename, encoding='utf-8')

JSON file reader.

konlpy.utils.read_txt(filename, encoding='utf-8')

Text file reader.


Replaces some ambiguous punctuation marks to simpler ones.